Pharmaceutical Medical Claim Extraction Engine

$ cd ~/projects/medical-claims-extraction-engine agent.shipped · in production

Precision
Document AI.

A 15-person review team couldn’t keep up with the
claim volume. We built a 21 CFR Part 11 + GAMP 5 v2-aligned
document AI pipeline on LayoutLMv3, Donut and ColPali
with a spaCy NER head for fallback. 93.7% claim F1,
8,500+ claims a day, under 2% needing a human.

Pharmaceutical medical claim extraction engine

Industry: Pharma / Regulatory
Timeline: 10 weeks
Key result: 93.7% extraction F1, $1.8M savings
Tech stack: LayoutLMv3, Donut, ColPali, VLM (Claude Sonnet 4 vision), spaCy 3.8 NER fallback, Python, FastAPI, PostgreSQL, 21 CFR Part 11 + GAMP 5 v2 + PCCPs

We shipped an FDA-aligned extraction engine that now processes 8,500+ claims a day at 93.7% end-to-end claim F1 (96.1% entity F1, 91.4% relation F1) and 99.2% field coverage. Under 2% of claims escalate to a human, and the client retired a 15-person manual review queue.

LayoutLMv3 handles document layout, Donut covers free-form OCR, ColPali drives visual document retrieval against the source-of-truth library, and a vision-language model (Claude Sonnet 4 vision) does the high-precision extraction — with spaCy 3.8 NER as a structured-text fallback. FastAPI plus PostgreSQL keep the serving and audit story clean. $1.8M a year in labor saved with a validation layer the regulator could read.

AI Delivery Approach

Regulatory first, model second — We mapped the FDA claim taxonomy and validation rules into 21 CFR Part 11 + GAMP 5 v2 + a Predetermined Change Control Plan (PCCP) before writing the pipeline. The model had to serve the spec, not the other way around.
Layout + entities + relations + fields — LayoutLMv3 layout understanding, then entity recognition, then relation extraction, then structured field mapping — so a claim like “reduced LDL by 18% over 12 weeks” comes out as typed data, not a string. Each stage has its own F1 in the eval report.
Pharmacologists in the loop — Domain experts reviewed extractions against source PDFs through a dedicated tool. Their corrections went straight back into the training set.
Audit-ready serving — Confidence scores on every field, full audit logs per claim, and exception handling that escalates instead of guessing. The review team trusts the low-confidence flag.

What was actually hard

Claim documents come in dozens of formats and the language is precise in ways a general-purpose NER model doesn’t respect. A wrong unit, a missed qualifier, or a misread cohort and the whole submission is wrong. We had to hit a regulator-grade accuracy bar and ship extractions a pharmacologist could defend line by line.

Healthcare analytics dashboard on laptop

Project Outcome

The pipeline landed at 93.7% F1 and held up under pharmacologist review. Downstream submission prep got faster and more consistent across teams, and the review queue now only sees the hard claims instead of all of them.

> 93.7% extraction
F1 score > 8,500+ claims
per day > $1.8M annual labor
savings > <2% human
escalation

Analyst reviewing healthcare performance graphs

Manager presenting quality and compliance metrics

LayoutLMv3DonutColPaliClaude Sonnet 4 visionspaCy 3.8PythonFastAPIPostgreSQL21 CFR Part 11GAMP 5 v2PCCPs

“ImmovableTech didn't just deliver an AI system — they deeply understood our regulatory requirements and built a pipeline we could actually trust in production.”