// Knowledge base · use case

Document Processing & Data Extraction.

Invoices, contracts, reports, forms — unstructured documents go in, clean structured data comes out. Here's how we actually build the agent system that does it.

Typical build time4–8 weeks

Agent count5–7 agents

Accuracy target98%+ on extraction

Volume per month100 → 100,000+

// The team

Six agents.
One job.

Each agent has one role and a clear handoff. No giant prompts trying to do everything — small, testable units that an orchestrator coordinates.

Agent 01

Intake

Watcher · Classifier

Validates format, deduplicates, classifies document type.

Tools

IMAP / Gmail APIMIME parserHash dedupClaude (vision)

Agent 02

Extraction

OCR · Parser

Pulls every field into a strict schema and confidence-scores each value.

Tools

Claude · GPT-4oPDF.jsTesseract OCRJSON schema

Agent 03

Validation

Rule engine · QA

Cross-checks fields against business rules. Flags anomalies before a human sees them.

Tools

Supabase RLSRule DSLFuzzy matchClaude (reasoning)

Agent 04

Routing

Decision · Dispatch

Auto-approves, rejects, or routes to the right human based on your rules.

Tools

Workflow graphSlack APIEmail queue

Agent 05

Filing

Sync · Archive

Posts data to the destination and archives the original with the correct naming.

Tools

Xero APIDrive APIHubSpotWebhooks

Agent 00

Orchestrator

Supervisor · Logger

Coordinates handoffs, retries on failure, logs every action for full auditability.

Tools

State machinePostgres logRetry policy

// Real example

A €4,200 invoice
arrives at 09:14.

ACME Corp emails an invoice. Nobody opens the email. By 09:14:42 it's reconciled, filed, and waiting on Sarah's approval. Here's the trace.

live trace · live trace · invoice-Q2-3119.pdfprocessing

Orchestrator

Supervisor · State machine

coordinating handoffs

agents

Intake

Watcher

idle

Extraction

OCR · Parser

idle

Validation

Rules · QA

idle

Routing

Decision

idle

Filing

Sync

idle

// Inputs

What we need from you.

📥

15–30 sample documents

Real ones, not synthetic. Mix of clean cases and edge cases — that's where most of the engineering time goes.

📋

Your business rules

Approval thresholds, vendor whitelist, what's auto-approve vs. human review. We turn these into testable rules.

🔌

API access

Read access to where docs come from (inbox, drive) and write access to where data goes (CRM, accounting).

👤

A point person

One human who can answer "is this edge case real or noise?" during the build. ~2 hours/week for 3 weeks.

// Timeline

From kick-off to live.

Discovery

Days 1–7

Map your current process, review samples, agree the rules and the schema.

Build

Days 8–28

Agents wired up, schema locked, end-to-end working on your real samples in a sandbox.

Test & tune

Days 29–42

Run on live volume in shadow mode. Tune confidence thresholds and rules until you trust it.

Go live

Days 43–56+

Cut over. We monitor for the first two weeks. You own it from there — or we keep it on retainer.

Have docs piling up?

Book a free 30-minute scoping call. We'll look at your samples and give you an honest read on what's automatable, what isn't, and what it'd cost.

Book a Call →Book a Call →

Document Processing & Data Extraction.

Six agents.One job.

A €4,200 invoicearrives at 09:14.

What we need from you.

From kick-off to live.

Have docs piling up?

Six agents.
One job.

A €4,200 invoice
arrives at 09:14.