[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-mistral-ocr-4-document-ai-structure-en":3,"article-related-mistral-ocr-4-document-ai-structure-en":30,"series-research-567f2a82-494e-493a-9d43-00dfbc8a7bfd":75},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":22,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":29},"567f2a82-494e-493a-9d43-00dfbc8a7bfd","mistral-ocr-4-document-ai-structure-en","Mistral OCR 4 brings structure to document AI","\u003Cp data-speakable=\"summary\">Mistral OCR 4 turns scanned pages into structured document data with boxes, labels, and confidence scores.\u003C\u002Fp>\u003Cp>\u003Ca href=\"https:\u002F\u002Fmistral.ai\u002Fnews\u002Focr-4\u002F\" target=\"_blank\" rel=\"noopener\">Mistral AI\u003C\u002Fa> has released \u003Ca href=\"https:\u002F\u002Fmistral.ai\u002Fnews\u002Focr-4\u002F\" target=\"_blank\" rel=\"noopener\">Mistral OCR 4\u003C\u002Fa>, and the pitch is simple: this is OCR built for document intelligence, not plain text extraction. The model supports 170 languages across 10 language groups, can run in a single container for self-hosted deployments, and the \u003Ca href=\"\u002Ftag\u002Fapi\">API\u003C\u002Fa> starts at $4 per 1,000 pages, or $2 per 1,000 pages with Batch API.\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>Metric\u003C\u002Fth>\u003Cth>Value\u003C\u002Fth>\u003Cth>Why it matters\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>Language coverage\u003C\u002Ftd>\u003Ctd>170 languages\u003C\u002Ftd>\u003Ctd>Useful for global document pipelines and low-resource scripts\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>API price\u003C\u002Ftd>\u003Ctd>$4 per 1,000 pages\u003C\u002Ftd>\u003Ctd>Sets the baseline for high-volume OCR workloads\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Batch API price\u003C\u002Ftd>\u003Ctd>$2 per 1,000 pages\u003C\u002Ftd>\u003Ctd>Halves cost for asynchronous ingestion jobs\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Document AI price\u003C\u002Ftd>\u003Ctd>$5 per 1,000 pages\u003C\u002Ftd>\u003Ctd>No-code path for teams using Mistral Studio\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>OlmOCRBench score\u003C\u002Ftd>\u003Ctd>85.20\u003C\u002Ftd>\u003Ctd>Top score in Mistral’s reported benchmark set\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>OCR that keeps the page structure\u003C\u002Fh2>\u003Cp>The big change in OCR 4 is that it returns more than text. Each block gets a bounding box, a block type, and inline confidence scores, so downstream systems can tell where content lives on the page and how reliable each extraction is. That matters when you are building search, \u003Ca href=\"\u002Ftag\u002Frag\">RAG\u003C\u002Fa>, invoice parsing, or compliance workflows, because plain text alone throws away layout cues that often carry meaning.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782468180808-ulcg.png\" alt=\"Mistral OCR 4 brings structure to document AI\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Mistral says OCR 4 is built as a small, focused model, which is a very different bet from giant general-purpose models that can read documents as a side effect. The company is targeting enterprises that need structured output, low-friction ingestion, and the option to keep data inside their own infrastructure. For teams dealing with contracts, scientific papers, or multilingual archives, that combination is more practical than raw transcription.\u003C\u002Fp>\u003Cul>\u003Cli>Bounding boxes help highlight exact text regions in a UI.\u003C\u002Fli>\u003Cli>Typed blocks make it easier to separate tables, titles, equations, and signatures.\u003C\u002Fli>\u003Cli>Confidence scores support human review and redaction workflows.\u003C\u002Fli>\u003Cli>Single-container deployment helps with sovereignty and residency requirements.\u003C\u002Fli>\u003C\u002Ful>\u003Cp>The model also accepts common enterprise formats such as PDF, DOC, PPT, and OpenDocument. That is a small detail on paper, but it matters in real deployments, where the input mix is rarely clean or consistent. If your pipeline starts with office files, scanned PDFs, and multilingual reports, OCR 4 is built to process them without a long chain of format-specific hacks.\u003C\u002Fp>\u003Ch2>Why Mistral is pushing structure, not just accuracy\u003C\u002Fh2>\u003Cp>Mistral is clearly betting that document AI buyers care about more than a \u003Ca href=\"\u002Ftag\u002Fbenchmark\">benchmark\u003C\u002Fa> score. The company positions OCR 4 as an ingestion layer for enterprise search, retrieval-augmented generation, and \u003Ca href=\"\u002Ftag\u002Fagent\">agent\u003C\u002Fa> workflows. That makes sense: once a model can identify block types and confidence levels, it becomes much easier to build citation-ready search, source-grounded answers, and verification steps for humans in the loop.\u003C\u002Fp>\u003Cp>In the company’s own benchmarking section, independent annotators preferred OCR 4 over the systems tested, with win rates averaging 72%. Mistral also says OCR 4 took the top overall score on \u003Ca href=\"https:\u002F\u002Folmocrbench.org\u002F\" target=\"_blank\" rel=\"noopener\">OlmOCRBench\u003C\u002Fa> at 85.20, and scored 93.07 on \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fallenai\u002Folmocr\" target=\"_blank\" rel=\"noopener\">OmniDocBench\u003C\u002Fa>. Those are strong numbers, but Mistral is careful to note that benchmark scoring can misread correct outputs when math, columns, or formatting get involved.\u003C\u002Fp>\u003Cblockquote>“We benchmarked OCR 4 against the leading agentic document parsers across a chart and figure dense financial QA dataset and reached equivalent accuracy at roughly 8x lower cost and 17x lower latency.” — Aidan Donohue, AI Engineer at \u003Ca href=\"https:\u002F\u002Fwww.rogo.ai\u002F\" target=\"_blank\" rel=\"noopener\">Rogo\u003C\u002Fa>\u003C\u002Fblockquote>\u003Cp>That quote is useful because it gets past the usual model marketing. Cost and latency are the two numbers that decide whether OCR stays in a pilot or moves into production. If a document parser is fast but expensive, finance teams will notice. If it is cheap but sloppy, legal and compliance teams will notice even faster.\u003C\u002Fp>\u003Ch2>The benchmark story is good, but the caveats matter\u003C\u002Fh2>\u003Cp>Mistral spends a lot of space explaining where benchmark numbers can mislead. That is a healthy sign. OCR tasks are messy, and scoring systems often punish correct outputs when the reference data is wrong or the layout is complex. Mistral calls out ground-truth errors, equivalent math notation, equation segmentation, multi-column reading order, and block-type attribution as common sources of mismatch.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782468192178-641w.png\" alt=\"Mistral OCR 4 brings structure to document AI\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>The practical takeaway is that OCR 4 should be judged on your own documents, not on a leaderboard alone. That is especially true if your files include scientific notation, headers and footers, or dense multi-column layouts. A model can look weaker on a benchmark and still be better on your actual corpus, or the reverse can happen.\u003C\u002Fp>\u003Cul>\u003Cli>Human preference testing covered 600+ documents.\u003C\u002Fli>\u003Cli>The documents spanned 12+ languages.\u003C\u002Fli>\u003Cli>OCR 4 led Mistral’s internal Crawl Multilingual evaluation with a .98 score.\u003C\u002Fli>\u003Cli>Mistral says it outperformed competing systems across eight language groups, including specialized languages.\u003C\u002Fli>\u003C\u002Ful>\u003Cp>The multilingual angle may be the most practical part of the release. Mistral says OCR 4 performs well across English, Western Europe, Eastern Europe, Middle Eastern, Chinese, East Asian, Southeast Asian, and specialized languages. That last group includes scripts and languages such as Hindi, Japanese, Georgian, Bengali, Armenian, Hebrew, Greek, Gujarati, Tamil, Malayalam, Kannada, and Telugu. If you work with global archives, that breadth is more than a nice extra.\u003C\u002Fp>\u003Ch2>Pricing and deployment choices are the real product decision\u003C\u002Fh2>\u003Cp>There are now three ways to use the same engine. Developers can call the API directly at $4 per 1,000 pages, use Batch API at $2 per 1,000 pages for cheaper offline throughput, or use \u003Ca href=\"https:\u002F\u002Fmistral.ai\u002Fproducts\u002Fstudio\" target=\"_blank\" rel=\"noopener\">Mistral Studio\u003C\u002Fa> Document AI at $5 per 1,000 pages for a no-code path. That pricing spread tells you who Mistral wants to reach: builders who need control, ops teams who need throughput, and product teams who want a faster setup.\u003C\u002Fp>\u003Cp>The self-hosted option matters just as much as the price. A single-container deployment means organizations can keep sensitive documents inside their own infrastructure, which is a serious requirement in finance, government, healthcare, and legal work. In those environments, OCR is rarely just an extraction problem. It is a data-handling problem, a compliance problem, and sometimes an audit problem.\u003C\u002Fp>\u003Cp>For comparison, many OCR stacks force teams to choose between a cheap but limited parser and a larger document platform with more moving parts. OCR 4 tries to sit in the middle: structured enough for retrieval and agents, compact enough for self-hosting, and priced low enough to make batch ingestion realistic. That mix is likely why Mistral is pairing the API with Document AI inside Studio.\u003C\u002Fp>\u003Cp>One more detail is worth underlining: Mistral says OCR 4 is not meant for medical diagnosis, legal advice, high-stakes financial decisions, safety-critical systems, or non-document inputs like audio and video. That boundary is smart. It prevents the usual AI product overreach and keeps the model framed as infrastructure for document understanding, which is where it actually belongs.\u003C\u002Fp>\u003Ch2>What this means for teams building document pipelines\u003C\u002Fh2>\u003Cp>If you are building search, RAG, invoice extraction, or archive digitization, OCR 4 is the kind of release that can simplify a stack. Instead of bolting together OCR, layout detection, confidence scoring, and manual post-processing, teams get structured output from \u003Ca href=\"\u002Fnews\u002Fdanceopd-on-policy-generative-field-distillation-en\">one model\u003C\u002Fa>. That does not remove the need for review, but it reduces the amount of glue code between raw documents and usable data.\u003C\u002Fp>\u003Cp>The most interesting question now is whether teams adopt OCR 4 as a point solution or as the front door to a larger Mistral workflow. If the structured output really improves retrieval quality and human verification, expect more document pipelines to start with OCR 4 and end inside search or agent systems. If you are evaluating it, the right test is simple: run your own mixed-language, mixed-format documents through it and compare correction time, not just extraction accuracy.\u003C\u002Fp>\u003Cp>That is where the product will either earn its place or get ignored. The model is cheap enough to try, structured enough to matter, and flexible enough to fit into self-hosted or managed setups. The next move is on teams that still treat OCR as a commodity layer rather than the first step in a document intelligence system.\u003C\u002Fp>","Mistral OCR 4 adds boxes, block labels, and confidence scores to OCR, with API pricing from $4 per 1,000 pages.","mistral.ai","https:\u002F\u002Fmistral.ai\u002Fnews\u002Focr-4\u002F",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782468180808-ulcg.png","research","en","cd8b1802-2094-4f5c-89a9-230680124777",[17,18,19,20,21],"Mistral OCR 4","document AI","OCR","RAG","enterprise search",[23,24,25],"Mistral OCR 4 adds bounding boxes, block labels, and confidence scores to OCR output.","The API costs $4 per 1,000 pages, or $2 with Batch API, while Document AI costs $5.","Mistral positions OCR 4 for search, RAG, agents, and self-hosted enterprise deployments.",0,"2026-06-26T10:02:37.910976+00:00","2026-06-26T10:02:37.903+00:00","3103988e-c4fe-45e3-98ab-846500c9d507",{"tags":31,"relatedLang":34,"relatedPosts":38},[32],{"name":20,"slug":33},"rag",{"id":15,"slug":35,"title":36,"language":37},"mistral-ocr-4-document-ai-structure-zh","Mistral OCR 4 把文件變結構化資料","zh",[39,45,51,57,63,69],{"id":40,"slug":41,"title":42,"cover_image":43,"image_url":43,"created_at":44,"category":13},"de74bbd4-e3b6-407a-998b-b38c4170b586","autoregressive-boltzmann-generators-ditch-flows-en","Autoregressive Boltzmann Generators ditch flows","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782455575877-62qe.png","2026-06-26T06:32:30.585573+00:00",{"id":46,"slug":47,"title":48,"cover_image":49,"image_url":49,"created_at":50,"category":13},"c05899fc-dd62-4fad-a249-9748376c1ef2","river-llm-reinforcement-learning-without-answers-en","RiVER trains LLMs without ground-truth answers","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782454678234-6mk1.png","2026-06-26T06:17:27.491779+00:00",{"id":52,"slug":53,"title":54,"cover_image":55,"image_url":55,"created_at":56,"category":13},"696a4c45-6c7b-4a78-a947-2dee1ddc4a58","danceopd-on-policy-generative-field-distillation-en","DanceOPD distills image-editing skills into one model","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782453779169-rakb.png","2026-06-26T06:02:33.604728+00:00",{"id":58,"slug":59,"title":60,"cover_image":61,"image_url":61,"created_at":62,"category":13},"2ae923b8-38e0-402a-937e-8e085f6a022d","microsoft-ai-team-collaboration-cfp-2026-en","Microsoft funds AI research on team collaboration","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782415977730-aqdi.png","2026-06-25T19:32:33.635454+00:00",{"id":64,"slug":65,"title":66,"cover_image":67,"image_url":67,"created_at":68,"category":13},"cb071ec2-19f7-44b6-936e-6f37a9c43b33","ai-papers-code-music-rare-disease-en","3 AI papers on code, music, and diagnosis","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782372780798-rpru.png","2026-06-25T07:32:27.739296+00:00",{"id":70,"slug":71,"title":72,"cover_image":73,"image_url":73,"created_at":74,"category":13},"cd6be4d9-484d-4fa6-8736-8a3b564c4477","new-nlp-papers-agent-memory-tool-use-en","New NLP papers map agent memory and tool use","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782371891968-0m9y.png","2026-06-25T07:17:39.682691+00:00",[76,81,86,91,96,101,106,111,116,121],{"id":77,"slug":78,"title":79,"created_at":80},"a2715e72-1fe8-41b3-abb1-d0cf1f710189","ai-predictions-2026-big-changes-en","AI Predictions for 2026: Brace for Big Changes","2026-03-26T01:25:07.788356+00:00",{"id":82,"slug":83,"title":84,"created_at":85},"8404bd7b-4c2f-4109-9ec4-baf29d88af2b","ml-papers-of-the-week-github-research-desk-en","ML Papers of the Week Turns GitHub Into a Research Desk","2026-03-27T01:11:39.480259+00:00",{"id":87,"slug":88,"title":89,"created_at":90},"87897a94-8065-4464-a016-1f23e89e17cc","ai-ml-conferences-to-watch-in-2026-en","AI\u002FML Conferences to Watch in 2026","2026-03-27T01:51:54.184108+00:00",{"id":92,"slug":93,"title":94,"created_at":95},"6f1987cf-25f3-47a4-b3e6-db0997695be8","openclaw-agents-manipulated-self-sabotage-en","OpenClaw Agents Can Be Manipulated Into Failure","2026-03-28T03:03:18.899465+00:00",{"id":97,"slug":98,"title":99,"created_at":100},"a53571ad-735a-4178-9f93-cb09b699d99c","vega-driving-language-instructions-en","Vega: Driving with Natural Language Instructions","2026-03-28T14:54:04.698882+00:00",{"id":102,"slug":103,"title":104,"created_at":105},"a34581d6-f36e-46da-88bb-582fb3e7425c","personalizing-autonomous-driving-styles-en","Drive My Way: Personalizing Autonomous Driving Styles","2026-03-28T14:54:26.148181+00:00",{"id":107,"slug":108,"title":109,"created_at":110},"2bc1ad7f-26ce-4f02-9885-803b35fd229d","training-knowledge-bases-writeback-rag-en","Training Knowledge Bases with WriteBack-RAG","2026-03-28T14:54:45.643433+00:00",{"id":112,"slug":113,"title":114,"created_at":115},"71adc507-3c54-4605-bbe2-c966acd6187e","packforcing-long-video-generation-en","PackForcing: Efficient Long-Video Generation Method","2026-03-28T14:55:02.646943+00:00",{"id":117,"slug":118,"title":119,"created_at":120},"675942ef-b9ec-4c5f-a997-381250b6eacb","pixelsmile-facial-expression-editing-en","PixelSmile Framework Enhances Facial Expression Editing","2026-03-28T14:55:20.633463+00:00",{"id":122,"slug":123,"title":124,"created_at":125},"6954fa2b-8b66-4839-884b-e46f89fa1bc3","adaptive-block-scaled-data-types-en","IF4: Smarter 4-Bit Quantization That Adapts to Your Data","2026-03-31T06:00:36.65963+00:00"]