[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-mistral-ocr-4-document-ai-structure-zh":3,"article-related-mistral-ocr-4-document-ai-structure-zh":34,"series-research-cd8b1802-2094-4f5c-89a9-230680124777":77},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":25,"views":30,"created_at":31,"published_at":32,"topic_cluster_id":33},"cd8b1802-2094-4f5c-89a9-230680124777","mistral-ocr-4-document-ai-structure-zh","Mistral OCR 4 把文件變結構化資料","\u003Cp data-speakable=\"summary\">Mistral OCR 4 把掃描文件轉成結構化資料，還附上框線、區塊標籤和信心分數。\u003C\u002Fp>\u003Cp>\u003Ca href=\"https:\u002F\u002Fmistral.ai\u002Fnews\u002Focr-4\u002F\" target=\"_blank\" rel=\"noopener\">Mistral AI\u003C\u002Fa> 推出 \u003Ca href=\"https:\u002F\u002Fmistral.ai\u002Fnews\u002Focr-4\u002F\" target=\"_blank\" rel=\"noopener\">Mistral OCR 4\u003C\u002Fa>。這次不是單純辨識文字而已。它把頁面結構一起吐出來，對文件 AI 很實用。\u003C\u002Fp>\u003Cp>官方給的數字也很\u003Ca href=\"\u002Fnews\u002F15-ai-newsletters-by-use-case-zh\">直接\u003C\u002Fa>。它支援 170 種語言，涵蓋 10 個語言群組。\u003Ca href=\"\u002Ftag\u002Fapi\">API\u003C\u002Fa> 從每 1,000 頁 4 美元起，Batch API 則是 2 美元。若走 \u003Ca href=\"https:\u002F\u002Fmistral.ai\u002Fproducts\u002Fstudio\" target=\"_blank\" rel=\"noopener\">Mistral Studio\u003C\u002Fa> 的 Document AI，價格是每 1,000 頁 5 美元。\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>指標\u003C\u002Fth>\u003Cth>數值\u003C\u002Fth>\u003Cth>意義\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>語言支援\u003C\u002Ftd>\u003Ctd>170 種\u003C\u002Ftd>\u003Ctd>適合跨國文件流程\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>API 價格\u003C\u002Ftd>\u003Ctd>每 1,000 頁 4 美元\u003C\u002Ftd>\u003Ctd>適合高流量 OCR 工作\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Batch API 價格\u003C\u002Ftd>\u003Ctd>每 1,000 頁 2 美元\u003C\u002Ftd>\u003Ctd>適合非同步匯入\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Document AI 價格\u003C\u002Ftd>\u003Ctd>每 1,000 頁 5 美元\u003C\u002Ftd>\u003Ctd>適合不想自己串流程的團隊\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>OlmOCRBench 分數\u003C\u002Ftd>\u003Ctd>85.20\u003C\u002Ftd>\u003Ctd>官方主打的 benchmark 成績\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>OCR 4 不只看字，還看版面\u003C\u002Fh2>\u003Cp>這版最重要的改變，是它回傳的不只是純文字。每個區塊都有 bounding box、block type，還有 inline confidence score。講白了，就是讓系統知道文字在哪裡，也知道它有多可靠。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782468184906-6p2v.png\" alt=\"Mistral OCR 4 把文件變結構化資料\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>這件事對文件 AI 很重要。你在做搜尋、\u003Ca href=\"\u002Ftag\u002Frag\">RAG\u003C\u002Fa>、發票解析、合約抽取時，版面資訊常常比字本身更有用。純文字會把欄位、表格、標題、註記全打散，後面再補回來很痛苦。\u003C\u002Fp>\u003Cp>Mistral 這次的路線也很明確。它不是把 OCR 當成大\u003Ca href=\"\u002Fnews\u002Fdanceopd-on-policy-generative-field-distillation-zh\">模型\u003C\u002Fa>的副產品，而是做成一個小而專的模型。對要處理大量文件的團隊來說，這種設計通常比較務實。\u003C\u002Fp>\u003Cul>\u003Cli>Bounding box 可直接標出文字位置。\u003C\u002Fli>\u003Cli>Block type 有助分辨表格、標題、簽名。\u003C\u002Fli>\u003Cli>Confidence score 可接人工複核流程。\u003C\u002Fli>\u003Cli>單一容器部署，適合內網或私有雲。\u003C\u002Fli>\u003C\u002Ful>\u003Cp>它也支援常見企業格式。像 PDF、DOC、PPT、OpenDocument 都能吃。這點看起來普通，但實務上很重要。真實世界的文件來源通常很雜，不會只給你乾淨掃描檔。\u003C\u002Fp>\u003Ch2>為什麼 Mistral 要強調結構\u003C\u002Fh2>\u003Cp>Mistral 很清楚，它賣的不是單一準確率。它想賣的是文件入口層。也就是把 OCR \u003Ca href=\"\u002Fnews\u002Fschwab-crypto-exposure-theme-list-zh\">變成\u003C\u002Fa>搜尋、RAG、\u003Ca href=\"\u002Ftag\u002Fagent\">Agent\u003C\u002Fa> 工作流的前置資料管線。\u003C\u002Fp>\u003Cp>這個方向合理。只要模型能辨識區塊類型和信心分數，後面就能做引用定位、來源追蹤、人工審核。對企業來說，這比單純吐一坨文字更好用。\u003C\u002Fp>\u003Cp>官方也丟了幾個 \u003Ca href=\"\u002Ftag\u002Fbenchmark\">benchmark\u003C\u002Fa> 數字。它說在 \u003Ca href=\"https:\u002F\u002Folmocrbench.org\u002F\" target=\"_blank\" rel=\"noopener\">OlmOCRBench\u003C\u002Fa> 拿到 85.20，\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fallenai\u002Folmocr\" target=\"_blank\" rel=\"noopener\">OmniDocBench\u003C\u002Fa> 則是 93.07。另有獨立標註者偏好測試，平均勝率 72%。\u003C\u002Fp>\u003Cblockquote>“We benchmarked OCR 4 against the leading agentic document parsers across a chart and figure dense financial QA dataset and reached equivalent accuracy at roughly 8x lower cost and 17x lower latency.” — Aidan Donohue, AI Engineer at \u003Ca href=\"https:\u002F\u002Fwww.rogo.ai\u002F\" target=\"_blank\" rel=\"noopener\">Rogo\u003C\u002Fa>\u003C\u002Fblockquote>\u003Cp>這段引述很有意思。因為它講的不是模型情懷，而是成本和延遲。文件系統能不能進 production，通常就卡在這兩個數字。快但貴，財務會嫌。便宜但不準，法務會先翻白眼。\u003C\u002Fp>\u003Ch2>benchmark 好看，但要看限制\u003C\u002Fh2>\u003Cp>Mistral 也花不少篇幅提醒 benchmark 會騙人。這點我覺得算誠實。OCR 本來就很髒，評分常常會因為 ground truth 錯誤、數學式寫法不同、欄位順序不同而失真。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782468180982-l13m.png\" alt=\"Mistral OCR 4 把文件變結構化資料\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>像多欄排版、公式分段、頁首頁尾，這些都很容易讓分數失真。模型看起來差，不代表它真的差。反過來也一樣。只看 leaderboard，常常會選到不適合自己資料的東西。\u003C\u002Fp>\u003Cp>所以實務上，OCR 4 應該拿你的文件測。不要只看官方分數。你如果手上是合約、研究論文、財報、掃描表單，結果會差很多。\u003C\u002Fp>\u003Cul>\u003Cli>人類偏好測試涵蓋 600+ 文件。\u003C\u002Fli>\u003Cli>測試文件橫跨 12+ 種語言。\u003C\u002Fli>\u003Cli>官方說 Crawl Multilingual 評測拿到 0.98。\u003C\u002Fli>\u003Cli>它在 8 個語言群組都壓過競品。\u003C\u002Fli>\u003C\u002Ful>\u003Cp>多語言支援是這版很實際的賣點。Mistral 說它能處理 English、Western Europe、Eastern Europe、Middle Eastern、Chinese、East Asian、Southeast Asian，還有特殊語言群組。像 Hindi、Japanese、Georgian、Bengali、Armenian、Hebrew、Greek、Gujarati、Tamil、Malayalam、Kannada、Telugu 都在範圍內。\u003C\u002Fp>\u003Ch2>價格和部署，才是團隊真正會看的地方\u003C\u002Fh2>\u003Cp>現在有三種用法。第一種是直接 API，每 1,000 頁 4 美元。第二種是 Batch API，每 1,000 頁 2 美元。第三種是 \u003Ca href=\"https:\u002F\u002Fmistral.ai\u002Fproducts\u002Fstudio\" target=\"_blank\" rel=\"noopener\">Mistral Studio\u003C\u002Fa> 的 Document AI，每 1,000 頁 5 美元。\u003C\u002Fp>\u003Cp>這個價差很有意思。它其實在對三種人說話。開發者要控制力，Ops 要吞吐量，產品團隊要快速上線。Mistral 把這三條路都留著，算是很會賣。\u003C\u002Fp>\u003Cp>自架也很重要。官方說可以單容器部署。這對金融、政府、醫療、法律場景很關鍵。這些地方的 OCR 往往不是抽字問題，而是\u003Ca href=\"\u002Ftag\u002F資料治理\">資料治理\u003C\u002Fa>問題。\u003C\u002Fp>\u003Cp>如果拿來比，很多 OCR 工具不是太簡單，就是太重。OCR 4 想站在中間。它想保留結構化輸出，也想讓自架和批次匯入都能用。這種定位比純 OCR 工具更像文件基礎設施。\u003C\u002Fp>\u003Cp>官方也明講限制。它不適合醫療診斷、法律建議、高風險金融決策、安全關鍵系統，也不處理音訊和影片。這樣講很正常。至少它沒亂吹。\u003C\u002Fp>\u003Ch2>對台灣團隊來說，重點是管線，不是模型名\u003C\u002Fh2>\u003Cp>如果你在做搜尋、RAG、發票抽取、文件歸檔，OCR 4 會很有吸引力。因為它把 OCR、版面分析、信心分數收在一起。少掉很多 glue code，也少掉很多手工補洞。\u003C\u002Fp>\u003Cp>我會建議先拿混合語言文件測。像中英混排、掃描 PDF、表格、章節標題、欄位很多的表單，都丟進去。不要先看準確率，先看人工修正時間。\u003C\u002Fp>\u003Cp>這種產品最後能不能活下來，不是看 demo。是看你能不能把文件處理時間砍半。Mistral OCR 4 至少把工具箱整理得比較像樣了。接下來就看你手上的資料，值不值得把它放進正式流程。\u003C\u002Fp>","Mistral OCR 4 把 OCR 從純文字抽取，改成帶框、標籤與信心分數的文件資料。它支援 170 種語言，API 價格從每 1,000 頁 4 美元起。","mistral.ai","https:\u002F\u002Fmistral.ai\u002Fnews\u002Focr-4\u002F",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782468184906-6p2v.png","research","zh","567f2a82-494e-493a-9d43-00dfbc8a7bfd",[17,18,19,20,21,22,23,24],"Mistral OCR 4","OCR","document AI","文件 AI","bounding box","confidence score","Batch API","Mistral Studio",[26,27,28,29],"OCR 4 把純文字抽取升級成結構化文件資料。","API 價格是每 1,000 頁 4 美元，Batch API 是 2 美元。","它支援 170 種語言，也能單容器自架。","真正該測的是你的文件，不是只有官方 benchmark。",0,"2026-06-26T10:02:37.422252+00:00","2026-06-26T10:02:37.41+00:00","0c35a120-52fc-41fc-afa3-d404eb934158",{"tags":35,"relatedLang":36,"relatedPosts":40},[],{"id":15,"slug":37,"title":38,"language":39},"mistral-ocr-4-document-ai-structure-en","Mistral OCR 4 brings structure to document AI","en",[41,47,53,59,65,71],{"id":42,"slug":43,"title":44,"cover_image":45,"image_url":45,"created_at":46,"category":13},"a90ab5b6-f647-4cef-85af-35ff7bb21a93","autoregressive-boltzmann-generators-ditch-flows-zh","ArBG 改用自回歸做分子採樣","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782455577323-vrvt.png","2026-06-26T06:32:30.056363+00:00",{"id":48,"slug":49,"title":50,"cover_image":51,"image_url":51,"created_at":52,"category":13},"93b19c63-dbfd-4277-92b5-b5a60946fd65","river-llm-reinforcement-learning-without-answers-zh","RiVER 讓 LLM 不靠標準答案也能學","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782454671897-i8l3.png","2026-06-26T06:17:26.979468+00:00",{"id":54,"slug":55,"title":56,"cover_image":57,"image_url":57,"created_at":58,"category":13},"cd38b72e-b309-493d-b36f-684745ff5f7e","danceopd-on-policy-generative-field-distillation-zh","DanceOPD：把修圖技能蒸餾進同一模型","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782453784592-x1gk.png","2026-06-26T06:02:33.123618+00:00",{"id":60,"slug":61,"title":62,"cover_image":63,"image_url":63,"created_at":64,"category":13},"af1a155b-d8e6-4575-a014-959aef283098","microsoft-ai-team-collaboration-cfp-2026-zh","Microsoft 砸錢研究團隊協作 AI","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782415981776-jikr.png","2026-06-25T19:32:33.155576+00:00",{"id":66,"slug":67,"title":68,"cover_image":69,"image_url":69,"created_at":70,"category":13},"2cc1973d-a7a5-4031-8ed3-e05ca5d335fd","ai-papers-code-music-rare-disease-zh","3 篇 AI 論文：程式、音樂、罕病診斷","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782372792462-buxp.png","2026-06-25T07:32:27.274897+00:00",{"id":72,"slug":73,"title":74,"cover_image":75,"image_url":75,"created_at":76,"category":13},"f9ec6d6f-80a9-4a8e-b3ea-1eb5231aa796","new-nlp-papers-agent-memory-tool-use-zh","新 NLP 論文盯上代理記憶與工具使用","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782371888802-40t8.png","2026-06-25T07:17:39.070441+00:00",[78,83,88,93,98,103,108,113,118,123],{"id":79,"slug":80,"title":81,"created_at":82},"f18dbadb-8c59-4723-84a4-6ad22746c77a","deepmind-bets-on-continuous-learning-ai-2026-zh","DeepMind 押注 2026 連續學習 AI","2026-03-26T08:16:02.367355+00:00",{"id":84,"slug":85,"title":86,"created_at":87},"f4a106cb-02a6-4508-8f39-9720a0a93cee","ml-papers-of-the-week-github-research-desk-zh","每週 ML 論文清單，為何紅到 GitHub","2026-03-27T01:11:39.284175+00:00",{"id":89,"slug":90,"title":91,"created_at":92},"c4f807ca-4e5f-47f1-a48c-961cf3fc44dc","ai-ml-conferences-to-watch-in-2026-zh","2026 AI 研討會投稿時程整理","2026-03-27T01:51:53.874432+00:00",{"id":94,"slug":95,"title":96,"created_at":97},"cf046742-efb2-4753-aef9-caed5da5e32e","adaptive-block-scaled-data-types-zh","IF4：神經網路量化的聰明選擇","2026-03-31T06:00:36.990273+00:00",{"id":99,"slug":100,"title":101,"created_at":102},"53a0dc54-0371-4e40-8d5e-74e94a73840c","geometry-aware-similarity-metrics-for-neural-representations-zh","超越距離測量：用微分幾何重新理解神經網路","2026-03-31T06:01:01.241968+00:00",{"id":104,"slug":105,"title":106,"created_at":107},"fee7d472-a775-4b1d-bbc2-1e8bca1bbf8b","on-the-fly-repulsion-in-the-contextual-space-for-rich-divers-zh","讓AI繪圖更有創意：用排斥力提升生成多樣性","2026-03-31T06:01:25.439673+00:00",{"id":109,"slug":110,"title":111,"created_at":112},"a9901203-d69b-447b-8854-15d14eab32b4","vision-aided-beam-prediction-cnn-eca-zh","影像輔助波束預測升級 CNN","2026-04-01T10:00:25.8073+00:00",{"id":114,"slug":115,"title":116,"created_at":117},"b55e7dd4-0a24-4b3d-804d-b0309a03f498","triple-band-fss-mimo-antenna-sub-6-ghz-zh","三頻 FSS MIMO 天線瞄準 sub-6 GHz","2026-04-01T13:18:36.857305+00:00",{"id":119,"slug":120,"title":121,"created_at":122},"f68290bd-e7f3-4b30-ba22-dcd4e0130a66","openclaw-1299-repos-eight-weeks-analysis-zh","OpenClaw 1299 個 Repo 的資料解讀","2026-04-02T05:03:45.208411+00:00",{"id":124,"slug":125,"title":126,"created_at":127},"ed9f80eb-eb02-4d35-8ad4-0ddf428751dd","beam-coherence-aware-combining-mmwave-mimo-zh","毫米波 MIMO 的雙階合併法","2026-04-02T05:27:26.897188+00:00"]