Agentic Local AI for expert witness work

You just accepted a new case. Retaining counsel sends you a 10 GB document dump — thousands of PDFs spanning engineering specs, test results, patent filings, and internal correspondence. You need to sound intelligent in an interview next week. The clock is ticking. Do you reach for ChatGPT?

If your answer is “yes,” you may be walking into a career-altering trap. In courtrooms across the country, expert witnesses and attorneys are discovering — the hard way — that the AI tools they rely on are not the private workspaces they assumed them to be. But there is a better path, and it’s one that plays directly to the technical strengths you already have.

The Wreckage Is Already Piling Up

The past two years have produced a sobering string of AI-related litigation disasters. Courts are not merely cautioning practitioners — they are sanctioning them, striking their declarations, and destroying their professional credibility.

Three recent cases where AI misuse led to sanctions, exclusions, and strikes against legal professionals and expert witnesses

Recent cases where AI misuse led to sanctions, expert exclusion, and declaration strikes — from federal courts in New York, Minnesota, and California.

In Mata v. Avianca, lawyers submitted fabricated citations generated by ChatGPT and were formally sanctioned. In Kohls v. Ellison, an AI safety expert — of all people — had his entire declaration excluded after GPT-generated false citations “shattered his credibility.” And in Concord Music Group v. Anthropic, an expert’s declaration was partially struck for an acknowledged “AI hallucination.”

These are not hypothetical risks. They are published opinions, and opposing counsel in your next case has read every one of them.

AI Is a Third Party, Not a Private Workspace

The deeper problem goes beyond hallucinated citations. When you paste confidential case material into a cloud-based LLM, you are transmitting it to a third party. The legal consequences of that act are now settled law.

Slide showing the spectrum of privilege risk from local software to public LLMs, based on the Heppner ruling

The privilege-risk spectrum: from local software (lowest risk) to public LLMs (total waiver). Based on United States v. Heppner and the emerging consensus across federal courts.

In United States v. Heppner, the court ruled that AI chat logs were fully discoverable — no attorney-client privilege, no work-product protection. The formula is straightforward: self-directed use + public platform terms = third-party disclosure with zero privilege. Every prompt you type, every strategy you explore, every draft you refine through a public LLM becomes fertile ground for opposing counsel’s discovery requests.

Cartoon of an expert witness wondering how the other side knew the points in his report — with a parrot listening over his shoulder, representing AI training on user data

“How did the other side know the points in my expert report before I finished it?” Free-tier AI models routinely train on your inputs.

Cloud Providers Are Serious — but Fall Short

You might reasonably ask: “What about enterprise cloud AI? Surely AWS, Azure, and OpenAI’s business tiers solve this?” They try. These platforms invest heavily in security certifications, encryption, and contractual assurances.

But the protective orders governing your case documents don’t care about SOC 2 badges. A typical protective order in federal litigation enumerates exactly who may access confidential material — and none of the ten standard exceptions provide scope for uploading that material to a cloud LLM, no matter how many compliance certifications it holds. Meanwhile, cloud infrastructure breaches continue making headlines, and the gap between vendor promises and courtroom standards remains wide open.

The core problem: Even with the best cloud security, uploading protective-order material to any cloud service likely violates the order itself. The question isn’t whether the cloud provider is trustworthy — it’s whether your protective order permits it. In practice, the answer is almost always no.

Local Inference: A Solution Built for Litigation

Here’s where things get exciting — and where your technical background becomes a genuine competitive advantage. The same AI capabilities that make cloud LLMs so useful can now run entirely on hardware you control, in your own office, with no data ever leaving your local network.

Slide introducing local inference plus agents as the reliable solution to confidentiality constraints

Local inference plus intelligent agents: the architecture that makes AI fully compliant with protective order requirements.

This isn’t a theoretical exercise. Modern hardware — specifically Apple’s Mac Studio with the M3 Ultra chip and 512 GB of unified memory — can run state-of-the-art large language models locally with performance that would have been unthinkable two years ago. The key enabling technology is a class of models called Mixture-of-Experts (MoE) architectures, like Qwen3-235B-A22B. These models contain 235 billion parameters but activate only 22 billion for any given query, delivering GPT-4-class analytical capability while fitting comfortably in local memory.

How the Numbers Work

At Q4 quantization (a compression technique that preserves analytical quality while reducing memory footprint), Qwen3-235B requires roughly 130 GB of RAM. The M3 Ultra’s 800 GB/s memory bandwidth enables 16–24 tokens per second — fast enough for real-time question answering and overnight batch processing of large document sets. Add a 1 GB embedding model (nomic-embed-text) for semantic search, and you still have hundreds of gigabytes of headroom.

Compliance by Design: The Architecture

A well-designed local AI system isn’t just “an LLM on a laptop.” It’s a complete document-processing pipeline engineered for litigation workflows.

Architecture diagram showing the local processing pipeline: download, OCR, indexing into a per-case RAG database — all on local hardware

The local processing pipeline: documents flow from counsel’s system through OCR and optimization into a per-case semantic search database — entirely on-premises.

The workflow begins when you receive documents from retaining counsel. Files are downloaded into a dedicated per-case directory. PDFs are OCR-processed (using engines like Surya OCR that achieve 97%+ accuracy on technical documents with complex layouts, tables, and equations), images are optimized, and text is extracted. The processed content is then indexed into a per-case vector database — a semantic search index that lets you ask natural-language questions of your entire document corpus and get cited, sourced answers in seconds.

Every step runs on your local hardware. No cloud. No third-party transmission. Complete protective-order compliance by design, not by contract.

An Agent Architecture That Actually Works

The real power emerges when you pair local inference with an agent framework — software that turns a raw LLM into a persistent, capable research assistant. Platforms like OpenClaw provide the scaffolding: persistent memory across sessions, tool use (file system access, web search, code execution), scheduled background tasks, and the ability to spawn specialized sub-agents.

Diagram of a four-agent architecture: Skuld orchestrator, Oracle for case research, CodeWriter for code generation, and CodeQA for quality control

A practical four-agent architecture: a central orchestrator routes queries to specialized agents for case research, code generation, and quality control.

Consider a working system with four specialized agents. An orchestrator (Skuld) receives your questions via a messaging interface and intelligently routes them. Case-document queries go to Oracle, which runs Qwen3-235B locally and searches your per-case vector database to deliver answers with source citations. Code-generation tasks go to CodeWriter, running a separate local model (MiniMax-M2.5) for programming work. A quality-control agent reviews code output — and critically, this is the only agent that touches the cloud, and it handles only code, never case documents.

End-to-end flow diagram showing how a question from an expert witness travels through the agent system to produce a cited answer, all on local hardware

End-to-end: a question becomes a cited answer. Case data never leaves the local network — only code (never case documents) passes to the cloud QA agent.

The result is an AI research assistant you can interact with as naturally as texting a colleague, that searches thousands of documents in seconds, that generates analyses and code on demand — and that maintains ironclad compliance with your protective order obligations.

Why This Matters for You

If you’re an expert witness with a technical background, you’re uniquely positioned to benefit from this technology. You already understand the underlying concepts — quantization is just data compression, vector databases are just high-dimensional indexing, MoE architectures are just conditional computation. The tools are mature, the hardware is commercially available, and the legal imperative is clear.

What local AI inference gives you:
Speed: Digest a 10 GB document dump in hours, not weeks
Compliance: Protective-order materials never leave your network
Credibility: No risk of AI hallucinations entering your work product unchecked
Capability: GPT-4-class analysis running on a machine under your desk
Independence: No subscription dependencies, no terms-of-service changes, no vendor lock-in

The expert witnesses who master this technology won’t just avoid the pitfalls that are ending careers — they’ll deliver faster, deeper, and more defensible analyses than their peers. In a field where credibility is everything, that advantage compounds with every case.

Ready to Explore Local AI for Your Practice?

Whether you’re evaluating hardware, selecting models, or designing a document-processing pipeline for your next case, I’m happy to discuss how local inference can fit your specific workflow and compliance requirements.

Get in Touch

Consulting available on system architecture, model selection, and litigation-compliant AI deployment.