Document Processing & Data Extraction.

Invoices, contracts, reports, forms — unstructured documents go in, clean structured data comes out. Here's how we actually build the agent system that does it.

Typical build time4–8 weeks
Agent count5–7 agents
Accuracy target98%+ on extraction
Volume per month100 → 100,000+

Six agents.
One job.

Each agent has one role and a clear handoff. No giant prompts trying to do everything — small, testable units that an orchestrator coordinates.

Agent 01
Intake
Watcher · Classifier
+

Validates format, deduplicates, classifies document type.

Tools
IMAP / Gmail APIMIME parserHash dedupClaude (vision)
Agent 02
Extraction
OCR · Parser
+

Pulls every field into a strict schema and confidence-scores each value.

Tools
Claude · GPT-4oPDF.jsTesseract OCRJSON schema
Agent 03
Validation
Rule engine · QA
+

Cross-checks fields against business rules. Flags anomalies before a human sees them.

Tools
Supabase RLSRule DSLFuzzy matchClaude (reasoning)
Agent 04
Routing
Decision · Dispatch
+

Auto-approves, rejects, or routes to the right human based on your rules.

Tools
Workflow graphSlack APIEmail queue
Agent 05
Filing
Sync · Archive
+

Posts data to the destination and archives the original with the correct naming.

Tools
Xero APIDrive APIHubSpotWebhooks
Agent 00
Orchestrator
Supervisor · Logger
+

Coordinates handoffs, retries on failure, logs every action for full auditability.

Tools
State machinePostgres logRetry policy

A €4,200 invoice
arrives at 09:14.

ACME Corp emails an invoice. Nobody opens the email. By 09:14:42 it's reconciled, filed, and waiting on Sarah's approval. Here's the trace.

live trace · live trace · invoice-Q2-3119.pdfprocessing
Orchestrator
Supervisor · State machine
coordinating handoffs
Intake
Watcher
idle
Extraction
OCR · Parser
idle
Validation
Rules · QA
idle
Routing
Decision
idle
Filing
Sync
idle
Orchestrator
Supervisor · State machine
coordinating handoffs
agents
Intake
Watcher
idle
Extraction
OCR · Parser
idle
Validation
Rules · QA
idle
Routing
Decision
idle
Filing
Sync
idle

What we need from you.

📥
15–30 sample documents

Real ones, not synthetic. Mix of clean cases and edge cases — that's where most of the engineering time goes.

📋
Your business rules

Approval thresholds, vendor whitelist, what's auto-approve vs. human review. We turn these into testable rules.

🔌
API access

Read access to where docs come from (inbox, drive) and write access to where data goes (CRM, accounting).

👤
A point person

One human who can answer "is this edge case real or noise?" during the build. ~2 hours/week for 3 weeks.

From kick-off to live.

01
Discovery
Days 1–7

Map your current process, review samples, agree the rules and the schema.

02
Build
Days 8–28

Agents wired up, schema locked, end-to-end working on your real samples in a sandbox.

03
Test & tune
Days 29–42

Run on live volume in shadow mode. Tune confidence thresholds and rules until you trust it.

04
Go live
Days 43–56+

Cut over. We monitor for the first two weeks. You own it from there — or we keep it on retainer.

Have docs piling up?

Book a free 30-minute scoping call. We'll look at your samples and give you an honest read on what's automatable, what isn't, and what it'd cost.