Custom Document AI that extracts structured fields from images, PDFs, and OCR with high accuracy.
Input Document
Extracted Fields
"Karan XXX Reddy"
"05/XX/1998"
"MALE"
"XXX XXXX XXXX 7890"
Deepweights DER is a Document AI platform that converts unstructured documents — scans, photos, PDFs — into clean structured JSON your applications can use directly.
// Response (default — layout output) { "width": 1280, "height": 816, "boxes": [ { "label": "name", "text": "Karan XXX Reddy", "score": 0.939, "box": [[426,230],[704,230],[704,265],[426,265]] }, { "label": "dob", "text": "05/XX/1998", "score": 0.955, "box": [[422,273],[828,273],[828,306],[422,306]] }, { "label": "gender", "text": "MALE", "score": 0.913, "box": [[421,319],[590,319],[590,359],[421,359]] }, { "label": "uid", "text": "XXX XXXX XXXX 7890", "score": 0.966, "box": [[372,647],[960,647],[960,690],[372,690]] } ] } // Add ?format=simple for clean key-value output { "name": "Karan XXX Reddy", "dob": "05/XX/1998", "gender": "MALE", "uid": "XXX XXXX XXXX 7890" }
Everything you need to extract structured data from documents at scale.
Each document type gets its own model trained on your samples — no generic one-size-fits-all approach.
Works with Google Vision, Azure AI Vision, or custom OCR — any engine that outputs BoundingPoly coordinates.
Optional post-extraction rules to validate and reformat dates, IDs, phone numbers, and more.
Uses spatial position alongside text. Bounding boxes tell the model where a field appears — not just what it says.
Handles skewed scans, low-res photos, mixed-script text, and stamps — real-world conditions, not lab images.
One endpoint, JSON in, JSON out. Two modes: direct image upload or pass your OCR output. No SDK required.
From document samples to live API in days.
Send us a set of sample documents — no annotation needed. We handle all labelling and annotation ourselves.
We train a custom model for your document type and deploy it on our infrastructure — fully managed, no setup on your end.
Send your OCR output or raw image. Get structured fields back — with layout coordinates and confidence scores — in milliseconds.
Bring your own OCR output, or upload an image and let us handle OCR via Google Cloud Vision or our internal model. Same structured response either way.
// POST /der/model/predict/{model_id} — Send OCR output (rec + det), model ID in path
POST /der/model/predict/{model_id} Content-Type: application/json { "rules": {}, "rec": [ "Karan XXX Reddy", "जन्म तिथि/DOB: 05/XX/1998", "पुरुष/ MALE", "XXX XXXX XXXX 7890" ... // full OCR text array ], "det": [ [[426,230],[704,230],[704,265],[426,265]], [[422,273],[828,273],[828,306],[422,306]], ... // bounding boxes for each word ], "width": 1280, "height": 816 } ───────────────────────────────────────── // Response (default — layout output) { "width": 1280, "height": 816, "boxes": [ { "label": "name", "text": "Karan XXX Reddy", "score": 0.939, "box": [[426,230],...] }, { "label": "dob", "text": "05/XX/1998", "score": 0.955, "box": [[422,273],...] }, { "label": "uid", "text": "XXX XXXX XXXX 7890", "score": 0.966, "box": [[372,647],...] } ] } // Add ?format=simple for clean key-value output { "name": "Karan XXX Reddy", "dob": "05/XX/1998", "gender": "MALE", "uid": "XXX XXXX XXXX 7890" }
Append ?format=simple to either endpoint to receive a flat key-value object instead of the full layout response. Ideal for downstream application logic.
Simple key-value fields, or full layout data with positions and scores.
// GET ?format=simple — flat key-value response
{ "name": "Karan XXX Reddy", "dob": "05/XX/1998", "gender": "MALE", "uid": "XXX XXXX XXXX 7890" }
// Default — layout output with bounding boxes, labels, and scores
{ "width": 1280, "height": 816, "boxes": [ { "label": "name", "text": "Karan XXX Reddy", "fmt_text": null, "score": 0.9385673851198314, "box": [[426,230],[704,230],[704,265],[426,265]] }, { "label": "dob", "text": "जन्म तिथि/DOB: 05/XX/1998", "fmt_text": null, "score": 0.9548244589662007, "box": [[422,273],[828,273],[828,306],[422,306]] }, { "label": "uid", "text": "XXX XXXX XXXX 7890", "fmt_text": null, "score": 0.9663165262058003, "box": [[372,647],[960,647],[960,690],[372,690]] } ] }
Your documents never leave without a trace — because they never stay.
Documents and images are never written to disk or persisted anywhere in our infrastructure.
Extracted fields are returned to you and never cached or stored server-side.
API payloads are never logged. What goes in stays completely private.
Billing uses only an anonymized request counter — nothing else is tracked.
No OCR lock-in — use any engine that outputs BoundingPoly (polygon coordinates per word). We use spatial layout for extraction, so polygon accuracy matters.
Best for high-volume cloud pipelines.
Any engine that outputs BoundingPoly coordinates.
You can send detection only — bounding boxes without text — when you have sensitive data. The model works from spatial layout alone, so OCR text never leaves your infrastructure.
High-stakes document extraction where accuracy and privacy matter most.
Extract name, date of birth, ID numbers, and address from identity documents at scale — Aadhaar, passports, driver's licenses.
Extract key fields from banking documents — payee name, amount, date, and account number.
Upcoming
Pricing scales with document complexity — measured by the character length of the serialized detection array. No monthly minimums.
Tier is determined by len(json.dumps(det)) — the character length of the JSON-serialized detection array. Price estimates above are based on a 1024 vector dimension model. Only request count is tracked — no data retained.
Image upload endpoint: Extraction pricing follows the same tier structure, but OCR cost is charged separately on top — billed based on the OCR provider used (Google Cloud Vision or internal model).
Get API access and go from document chaos to structured JSON in days.