[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-xiaomi-mimo-v2-omni-perception-action-zh":3,"article-related-xiaomi-mimo-v2-omni-perception-action-zh":32,"series-industry-526c4740-6990-4cda-ad85-02e1cbd8061d":75},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":24,"views":28,"created_at":29,"published_at":30,"topic_cluster_id":31},"526c4740-6990-4cda-ad85-02e1cbd8061d","xiaomi-mimo-v2-omni-perception-action-zh","Xiaomi MiMo-V2-Omni 把感知接到動作","\u003Cp data-speakable=\"summary\">這篇整理 Xiaomi \u003Ca href=\"\u002Fnews\u002Fxiaomi-mimo-v2-5-pro-pricing-benchmarks-limits-zh\">MiMo\u003C\u002Fa>-V2-Omni 的 5 個重點，幫你判斷它是否適合做多模態代理、瀏覽器操作與辦公自動化。\u003C\u002Fp>\u003Cp>Xiaomi 的 MiMo-V2-Omni 不是只會回答問題的模型，而是把看、聽、判斷和執行接在一起。若你想知道它能不能接進代理\u003Ca href=\"\u002Fnews\u002Fai-companies-must-earn-trust-on-jobs-zh\">工作\u003C\u002Fa>流、處理長音訊、讀圖表，甚至直接產出文件，下面 5 點就夠你做初步選擇。它也已提供 \u003Ca href=\"\u002Ftag\u002Fapi\">API\u003C\u002Fa>，輸入每百萬 \u003Ca href=\"\u002Ftag\u002Ftoken\">token\u003C\u002Fa> 0.4 美元、輸出每百萬 token 2 美元。\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>項目\u003C\u002Fth>\u003Cth>規格 A\u003C\u002Fth>\u003Cth>規格 B\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>API 價格\u003C\u002Ftd>\u003Ctd>$0.4 \u002F 百萬輸入 token\u003C\u002Ftd>\u003Ctd>$2 \u002F 百萬輸出 token\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>音訊能力\u003C\u002Ftd>\u003Ctd>連續音訊理解\u003C\u002Ftd>\u003Ctd>超過 10 小時\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>視覺表現\u003C\u002Ftd>\u003Ctd>圖表分析\u003C\u002Ftd>\u003Ctd>接近 Gemini 3，優於 Claude 4.6 Opus\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>影片能力\u003C\u002Ftd>\u003Ctd>原生音訊-影片聯合輸入\u003C\u002Ftd>\u003Ctd>支援情境感知與預測\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>1. 一個模型同時管文字、視覺和語音\u003C\u002Fh2>\u003Cp>MiMo-V2-Omni 的核心賣點，是把文字、圖像、音訊和影片放進同一套基礎模型。Xiaomi 的說法很直接：感知和動作不該拆成好幾個系統，否則代理流程會多出很多接線與轉換成本。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782419571228-z82b.png\" alt=\"Xiaomi MiMo-V2-Omni 把感知接到動作\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>對實際應用來說，這代表模型更適合做「先看懂，再決定，再執行」的任務，而不是只做單輪問答。\u003C\u002Fp>\u003Cul>\u003Cli>同一模型處理多模態輸入\u003C\u002Fli>\u003Cli>適合代理框架與工作流整合\u003C\u002Fli>\u003Cli>降低多系統串接成本\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>2. 圖表與複雜畫面的理解更像辦公助手\u003C\u002Fh2>\u003Cp>Xiaomi 強調它的視覺推理能力，尤其是圖表分析與跨領域視覺任務。官方描述裡，它在這類任務上優於 \u003Ca href=\"\u002Ftag\u002Fclaude\">Claude\u003C\u002Fa> 4.6 Opus，並逼近 \u003Ca href=\"\u002Ftag\u002Fgemini\">Gemini\u003C\u002Fa> 3 這類頂級閉源模型。\u003C\u002Fp>\u003Cp>這種能力不只是展示效果好看，而是能直接影響\u003Ca href=\"\u002Fnews\u002Fmicrosoft-ai-team-collaboration-cfp-2026-zh\">研究\u003C\u002Fa>、報告、簡報和瀏覽器任務的可用性。能讀懂密集圖表的模型，才有機會真的接手辦公場景。\u003C\u002Fp>\u003Cul>\u003Cli>圖表解讀\u003C\u002Fli>\u003Cli>複雜場景推理\u003C\u002Fli>\u003Cli>適合研究與文件工作\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>3. 音訊理解不是只做短句轉寫\u003C\u002Fh2>\u003Cp>MiMo-V2-Omni 的音訊能力不只是在辨識語音。Xiaomi 提到它可做環境聲分類、多人聲分離、音訊與影像聯合推理，還能理解超過 10 小時的連續音訊。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782419571337-echp.png\" alt=\"Xiaomi MiMo-V2-Omni 把感知接到動作\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>這讓它比較像能處理真實世界噪音的助理，而不是只在乾淨錄音裡表現好。對需要長時間監聽、會議摘要或事件回放的場景，這點很關鍵。\u003C\u002Fp>\u003Cul>\u003Cli>環境聲分類\u003C\u002Fli>\u003Cli>多人聲分離\u003C\u002Fli>\u003Cli>音訊與影像聯合推理\u003C\u002Fli>\u003Cli>長音檔理解\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>4. 影片理解把聲音、畫面和時間線綁在一起\u003C\u002Fh2>\u003Cp>這個模型支援原生音訊與影片聯合輸入，重點不是單看逐格畫面，而是把聲音、動作與上下文一起理解。Xiaomi 也提到影片預訓練帶來更好的情境感知與預測能力。\u003C\u002Fp>\u003Cp>換句話說，它比較像在追蹤事件如何發生，而不是只回答「畫面裡有什麼」。這對內容審核、事件回溯、現場分析都更實用。\u003C\u002Fp>\u003Ccode>可想像的用法：\u003Cbr>- 追蹤直播事件的聲畫變化\u003Cbr>- 找出場景中哪裡改變了\u003Cbr>- 預測下一個動作或事件\u003C\u002Fcode>\u003Ch2>5. 真正拉開差距的是瀏覽器和辦公動作\u003C\u002Fh2>\u003Cp>MiMo-V2-Omni 最有意思的地方，在於它不只會理解，還能動手。Xiaomi 說它可以呼叫工具、執行函式、操作 GUI，並接入主流代理框架；示例裡也包含購物、與客服議價、發布 TikTok 影片等瀏覽器任務。\u003C\u002Fp>\u003Cp>辦公流程也被放進同一條路徑。它能透過自然對話產出 Word、Excel、PDF 和 PPT，還能結合網頁搜尋與檔案技能，做出像升學建議這類結構化輸出。\u003C\u002Fp>\u003Cul>\u003Cli>多分頁瀏覽器操作\u003C\u002Fli>\u003Cli>可處理反自動化檢查後的流程恢復\u003C\u002Fli>\u003Cli>可輸出 Word、Excel、PDF、PPT\u003C\u002Fli>\u003Cli>API 入口：platform.xiaomimimo.com\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>哪種適合你\u003C\u002Fh2>\u003Cp>如果你只需要純文字聊天或簡單自動化，MiMo-V2-Omni 可能太重。若你的工作常碰到圖片、音訊、影片、瀏覽器和辦公文件，而且希望一個模型串起整條流程，它就很對題。\u003C\u002Fp>\u003Cp>最適合的是想把「看懂內容」和「實際完成任務」接在一起的團隊。若你的重點只是便宜、快速的文字生成，較小的模型仍可能更划算。\u003C\u002Fp>","5 個重點看懂 Xiaomi MiMo-V2-Omni：一個把視覺、音訊、影片與瀏覽器動作串起來的多模態代理模型。","mimo.mi.com","https:\u002F\u002Fmimo.mi.com\u002Fdocs\u002Fen-US\u002Fnews\u002Flatest\u002Fv2-omni-release",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782419571228-z82b.png","industry","zh","d023a8fa-d96f-40f7-bc2c-31e00f459c29",[17,18,19,20,21,22,23],"Xiaomi","MiMo-V2-Omni","多模態代理","瀏覽器自動化","辦公自動化","音訊理解","影片理解",[25,26,27],"MiMo-V2-Omni 把文字、視覺、音訊和影片整合到同一模型。","它的亮點不只理解內容，還能直接操作瀏覽器與辦公文件。","長音訊、圖表分析和聲畫聯合理解，是它最值得注意的能力。",0,"2026-06-25T20:32:23.53955+00:00","2026-06-25T20:32:23.533+00:00","fe20f6f6-432b-47bf-a410-a5f516d885ed",{"tags":33,"relatedLang":34,"relatedPosts":38},[],{"id":15,"slug":35,"title":36,"language":37},"xiaomi-mimo-v2-omni-perception-action-en","Xiaomi MiMo-V2-Omni turns perception into action","en",[39,45,51,57,63,69],{"id":40,"slug":41,"title":42,"cover_image":43,"image_url":43,"created_at":44,"category":13},"76b34bb5-e4ad-488a-bf7b-03b5c71ecc4b","ethereum-foundation-reorganizes-cuts-54-staff-zh","Ethereum Foundation 裁 54 人重整組織","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782424970250-qsod.png","2026-06-25T22:02:22.954478+00:00",{"id":46,"slug":47,"title":48,"cover_image":49,"image_url":49,"created_at":50,"category":13},"47ecd595-8782-403d-b091-64e0fec5e176","ai-companies-must-earn-trust-on-jobs-zh","AI 公司要贏，先證明自己不會掏空工作","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782416873244-86n5.png","2026-06-25T19:47:25.696056+00:00",{"id":52,"slug":53,"title":54,"cover_image":55,"image_url":55,"created_at":56,"category":13},"1709aaa0-6b69-402d-954c-9b367d30a5f0","microsoft-ai-education-report-adoption-support-zh","微軟：AI 已成教室日常","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782415075444-xfph.png","2026-06-25T19:17:27.883368+00:00",{"id":58,"slug":59,"title":60,"cover_image":61,"image_url":61,"created_at":62,"category":13},"85621665-982c-44b8-aa53-9d7352e51dac","ruffle-keeps-flash-games-playable-zh","Ruffle 讓 Flash 遊戲續命的 5 個關鍵","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782414176328-r7r3.png","2026-06-25T19:02:27.387704+00:00",{"id":64,"slug":65,"title":66,"cover_image":67,"image_url":67,"created_at":68,"category":13},"155a5305-f45e-4ea2-8661-7d0a4e613de4","jalapeno-turns-openai-into-chip-designer-zh","Jalapeño 讓 OpenAI 變晶片公司","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782407899341-wl80.png","2026-06-25T17:17:56.450808+00:00",{"id":70,"slug":71,"title":72,"cover_image":73,"image_url":73,"created_at":74,"category":13},"7d73898b-ddb7-4326-a8a5-94d1afb5311c","anthropic-overseas-data-center-push-right-move-zh","Anthropic 海外資料中心擴張是對的：算力已是全球戰略資產","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782406974421-rxl5.png","2026-06-25T17:02:28.557827+00:00",[76,81,86,91,96,101,106,111,116,121],{"id":77,"slug":78,"title":79,"created_at":80},"ee073da7-28b3-4752-a319-5a501459fb87","ai-in-2026-what-actually-matters-now-zh","2026 AI 真正重要的事","2026-03-26T07:09:12.008134+00:00",{"id":82,"slug":83,"title":84,"created_at":85},"83bd1795-8548-44c9-9a7e-de50a0923f71","trump-ai-framework-power-speech-state-preemption-zh","川普 AI 框架瞄準電力、言論與州權","2026-03-26T07:12:18.695466+00:00",{"id":87,"slug":88,"title":89,"created_at":90},"ea6be18b-c903-4e54-97b7-5f7447a612e0","nvidia-gtc-2026-big-ai-announcements-zh","NVIDIA GTC 2026 重點拆解","2026-03-26T07:14:26.62638+00:00",{"id":92,"slug":93,"title":94,"created_at":95},"4bcec76f-4c36-4daa-909f-54cd702f7c93","claude-users-spreading-out-and-getting-better-zh","Claude 用戶更分散，也更會用","2026-03-26T07:22:52.325888+00:00",{"id":97,"slug":98,"title":99,"created_at":100},"bd903b15-2473-4178-9789-b7557816e535","openclaw-raises-hard-question-for-ai-models-zh","OpenClaw 逼問 AI 模型價值","2026-03-26T07:24:54.707486+00:00",{"id":102,"slug":103,"title":104,"created_at":105},"eeac6b9e-ad9d-4831-8eec-8bba3f9bca6a","gap-google-gemini-checkout-fashion-search-zh","Gap 把結帳搬進 Gemini","2026-03-26T07:28:23.937768+00:00",{"id":107,"slug":108,"title":109,"created_at":110},"0740e53f-605d-4d57-8601-c10beb126f3c","google-pushes-gemini-transition-to-march-2026-zh","Google 把 Gemini 轉換延到 2026 年 3…","2026-03-26T07:30:12.825269+00:00",{"id":112,"slug":113,"title":114,"created_at":115},"e660d801-2421-4529-8fa9-86b82b066990","metas-llama-4-benchmark-scandal-gets-worse-zh","Meta Llama 4 分數風波又擴大","2026-03-26T07:34:21.156421+00:00",{"id":117,"slug":118,"title":119,"created_at":120},"183f9e7c-e143-40bb-a6d5-67ba84a3a8bc","accenture-mistral-ai-sovereign-enterprise-deal-zh","Accenture 攜手 Mistral AI 賣主權 AI","2026-03-26T07:38:14.818906+00:00",{"id":122,"slug":123,"title":124,"created_at":125},"191d9b1b-768a-478c-978c-dd7431a38149","mistral-ai-faces-its-hardest-year-yet-zh","Mistral AI 迎來最硬的一年","2026-03-26T07:40:23.716374+00:00"]