Coming Soon

Document Processing

Drop in a PDF, invoice, or scanned form and get structured data back. Three non-overlapping profiles — Read, Layout, and Form Extraction — let you control cost and output granularity. Combine them freely.

Try Interlocute Read the docs

PDF Read — text and language

The Read profile extracts OCR text with positional layout, semantic paragraphs with roles (title, heading, footer, footnote), font styles (bold, italic, handwritten), and per-span language detection. Built on the $1.50/1K-page tier — the most cost-effective way to get text out of documents.

PDF Layout — structure and figures

The Layout profile extracts page images, structured tables as JSON, document sections with hierarchy, detected figures and charts with bounding boxes, and mathematical formulas as LaTeX. Built on the $10/1K-page tier for deep structural understanding.

Form Extraction — fields and barcodes

The Form Extraction profile returns key-value pairs (labels mapped to values), structured tables, selection marks (checkboxes, radio buttons with state), and detected barcodes/QR codes with type and value. Purpose-built for invoices, receipts, and application forms.

Composable profiles

Profiles are non-overlapping cost buckets. Combine Read + Layout for full-page understanding without paying the prebuilt tier for features available at the read tier. Add Form Extraction only when you need structured field data. The platform groups derivations automatically.

Frequently Asked Questions

Document Processing

What document formats are supported?

Interlocute supports PDF, common image formats (JPEG, PNG, TIFF) for scanned documents, and plain text. Documents can be submitted via URL, blob storage, or inline as base64.

What are the three document profiles?

PDF Read extracts OCR text, paragraphs, styles, and languages ($1.50/1K pages). PDF Layout extracts tables, sections, figures, formulas, and page images ($10/1K pages). Form Extraction extracts key-value pairs, selection marks, and barcodes ($10/1K pages). Combine any of them in a single request.

Can Interlocute extract tables from PDFs?

Yes, via the PDF Layout or Form Extraction profiles. Both include table extraction as structured JSON representing rows and columns. This works for both native PDF tables and tables in scanned images.

Does it detect barcodes and QR codes?

Yes, via the Form Extraction profile. Barcodes and QR codes are detected with their type, value, and bounding polygon.

How is document processing billed?

Document processing is metered per page processed. The Read profile uses the $1.50/1K-page tier; Layout and Form Extraction use the $10/1K-page tier. You only pay for the profiles you select.

Can I process specific pages of a document?

Yes. The job descriptor accepts a page range parameter (e.g., '1-3,5') so you can process a subset of pages instead of the entire document.

Documentation

Document Processing Guide API Reference

Related Features

Video Intelligence

Upload a video and get a structured AI index: speech transcripts, visual scene analysis, entity extraction, sentiment, and AI summaries — choose the signals you need.

Image Intelligence

Upload an image and get layered AI analysis: a structural fingerprint with instant local metrics, semantic understanding from a multimodal LLM, and full forensic verification with manipulation detection.

RAG (Knowledge Retrieval)

Give your AI nodes access to your own documents and data. Interlocute handles the vector search, chunking, and context injection automatically.

Ready to build with Document Processing?

Deploy your node in seconds and start using Document Processing today.

Try Interlocute Read the docs