Document Processing
& Data Extraction.

Invoices, contracts, reports, forms — unstructured documents go in, clean structured data comes out. Here's how we actually build the agent system that does it.

Typical build time 4–8 weeks
Agent count 5–7 agents
Accuracy target 98%+ on extraction
Volume per month 100 → 100,000+

Six agents.
One job.

Each agent has one role and a clear handoff. No giant prompts trying to do everything — small, testable units that an orchestrator coordinates.

Agent 01
Intake
Watcher · Classifier
+

Validates format, deduplicates, classifies document type.

Tools
IMAP / Gmail API MIME parser Hash dedup Claude (vision)
Agent 02
Extraction
OCR · Parser
+

Pulls every field into a strict schema and confidence-scores each value.

Tools
Claude · GPT-4o PDF.js Tesseract OCR JSON schema
Agent 03
Validation
Rule engine · QA
+

Cross-checks fields against business rules. Flags anomalies before a human sees them.

Tools
Supabase RLS Rule DSL Fuzzy match Claude (reasoning)
Agent 04
Routing
Decision · Dispatch
+

Auto-approves, rejects, or routes to the right human based on your rules.

Tools
Workflow graph Slack API Email queue
Agent 05
Filing
Sync · Archive
+

Posts data to the destination and archives the original with the correct naming.

Tools
Xero API Drive API HubSpot Webhooks
Agent 00
Orchestrator
Supervisor · Logger
+

Coordinates handoffs, retries on failure, logs every action for full auditability.

Tools
State machine Postgres log Retry policy

A €4,200 invoice
arrives at 09:14.

ACME Corp emails an invoice. Nobody opens the email. By 09:14:42 it's reconciled, filed, and waiting on Sarah's approval. Here's the trace.

live trace · invoice-Q2-3119.pdf processing
Orchestrator
Supervisor · State machine
coordinating handoffs
Intake
Watcher
idle
Extraction
OCR · Parser
idle
Validation
Rules · QA
idle
Routing
Decision
idle
Filing
Sync
idle

What we need from you.

Most builds take 4–8 weeks. The faster you can give us these, the faster we ship. We sign an NDA before any of this changes hands.

📥
15–30 sample documents
Real ones, not synthetic. Mix of clean cases and edge cases — that's where most of the engineering time goes.
📋
Your business rules
Approval thresholds, vendor whitelist, what's auto-approve vs. human review. We turn these into testable rules.
🔌
API access
Read access to where docs come from (inbox, drive) and write access to where data goes (CRM, accounting).
👤
A point person
One human who can answer "is this edge case real or noise?" during the build. ~2 hours/week for 3 weeks.

From kick-off to live.

01
Discovery
Days 1–7
Map your current process, review samples, agree the rules and the schema.
02
Build
Days 8–28
Agents wired up, schema locked, end-to-end working on your real samples in a sandbox.
03
Test & tune
Days 29–42
Run on live volume in shadow mode. Tune confidence thresholds and rules until you trust it.
04
Go live
Days 43–56+
Cut over. We monitor for the first two weeks. You own it from there — or we keep it on retainer.

Have docs piling up?

Book a free 30-minute scoping call. We'll look at your samples and give you an honest read on what's automatable, what isn't, and what it'd cost.

More use cases → Book a free call →