Contract intelligence at scale: what we learned from 2M documents

Lessons from extracting structured data from two million contracts. What works, what doesn't, and why accuracy matters more than speed.

Over the past three years, we have processed just over two million contracts through Clad—vendor agreements, employment contracts, NDAs, leases, partnership agreements, and several categories we cannot discuss publicly. The lessons from that volume have changed how we think about contract intelligence.

The most important lesson: extraction is not a solved problem. Every vendor in this space claims 95%+ accuracy, but when you ask what they are measuring accuracy on, the answer is usually "entity extraction" or "clause classification"—tasks that sound impressive but are not actually useful for contract analytics.

What matters for contract intelligence is whether you can answer questions like: Which contracts have auto-renewal clauses? Which ones allow termination for convenience? Which ones have uncapped liability? And can you answer those questions with enough accuracy that a general counsel will stake their legal opinion on the results?

Contract extraction is useful when the output is accurate enough to make decisions. Anything less is expensive pattern recognition.

What we measure

We measure extraction accuracy on decision-relevant fields. For a commercial contract, that means: contract value, payment terms, renewal terms, termination provisions, liability caps, indemnification scope, governing law, and amendment requirements.

For each field, we calculate precision (of the clauses we flagged as containing this provision, how many actually do) and recall (of all the clauses that contain this provision, how many did we find). We do not ship a model until both precision and recall are above 92% on a held-out test set.

This is harder than it sounds. Contracts are written by humans who use different language for the same concept, bury important terms in subordinate clauses, and sometimes contradict themselves across different sections. A model that works perfectly on NDAs will fail catastrophically on joint venture agreements.

The schema problem

Most contract intelligence platforms use a fixed schema: they extract the same set of fields from every contract. This works for high-volume, standardized agreements like NDAs or employment offers. It breaks down for bespoke agreements where the fields that matter are unique to the deal.

We built Clad to support custom schemas. The client defines what they care about, we fine-tune the extraction models on their specific contract types, and we validate accuracy on a labeled sample before running the full corpus. This takes longer than off-the-shelf extraction, but it produces results the client can actually use.

Why this matters for M&A

The most common use case for contract intelligence is M&A diligence. The acquiring company wants to understand what obligations they are inheriting: What are the revenue commitments? Are there change-of-control provisions that could trigger renegotiation? Are there guarantees or indemnities that survive closing?

Getting this wrong is expensive. We have seen deals where post-close analysis revealed liabilities that were not flagged during diligence, because the extraction system missed a clause or misinterpreted its scope. The cost of that error—renegotiating terms, unwinding the deal, or eating the liability—far exceeds the cost of doing the extraction correctly in the first place.

— Henry

Contract intelligence at scale: what we learned from 2M documents.

What we measure

The schema problem

Why this matters for M&A

Contract intelligence at scale:
what we learned from 2M documents.