BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Mistral Releases OCR 3 With Improved Accuracy on Handwritten and Structured Documents

Mistral Releases OCR 3 With Improved Accuracy on Handwritten and Structured Documents

Listen to this article -  0:00

Mistral has released Mistral OCR 3, the latest version of its optical character recognition model, focused on higher accuracy across a wide range of document types, including handwritten notes, forms, low-quality scans, and complex tables.

According to Mistral, OCR 3 represents a significant step forward compared to its predecessor. In internal evaluations based on real customer document workflows, the new model achieved a 74% overall win rate over Mistral OCR 2, particularly on forms, handwritten content, and table-heavy documents. The benchmarks use fuzzy-match metrics against ground-truth outputs and are designed to reflect operational business scenarios rather than clean, synthetic samples.


Source: Mistral Blog

Technically, Mistral OCR 3 is designed to extract both text and embedded images while preserving document structure. Output is generated in Markdown, with tables reconstructed using HTML tags such as rowspan and colspan, enabling downstream systems to retain layout semantics rather than plain text alone. This makes the model suitable for pipelines that require structured JSON, searchable archives, or integration with agentic and retrieval-based systems.

The model shows improvements in areas that typically require manual review, effectively handling handwritten content, including cursive notes and annotations. Form parsing has better detection of labels, checkboxes, and mixed entries. OCR 3 is now more resilient to skew, compression artifacts, low resolution, and background noise found in scanned archives.

Early users highlight performance and language coverage as notable improvements. Patrick Jacobs, an ICT security leader and AI security specialist, commented:

Really impressed with the speed. And the fact that it has absolutely no problem with the Dutch language.

Production deployments are already expanding as a result of the improved accuracy. Niraj Bhatt, founder and principal consultant at Techseria, described how the update changes operational scope:

We’ve been running Mistral OCR in production on sales and purchase invoices with zero-touch data entry into our ERP. Seeing a 74% jump in accuracy on forms and handwriting in v3 means we can finally expand to delivery notes, utility bills, and legacy archives that used to be human-only.

Mistral OCR 3 is offered at $2 per 1,000 pages, with a Batch API option reducing the cost to $1 per 1,000 pages, positioning it as a lower-cost alternative to many enterprise OCR systems. The model identifier mistral-ocr-2512 can be integrated directly via API, while non-technical users can access it through the drag-and-drop Document AI Playground interface.

For organizations with strict data governance requirements, Mistral continues to offer self-hosted deployment options, allowing OCR workloads to run entirely within controlled infrastructure.

Mistral OCR 3 is available today and is fully backward compatible with OCR 2.

About the Author

Rate this Article

Adoption
Style

BT