[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-swe-bench":3},{"tag":4,"articles":11,"peer_article_count":67},{"id":5,"name":6,"slug":7,"article_count":8,"description_zh":9,"description_en":10},"7f2f1f94-2fd6-4985-9136-b9715dbf8f06","SWE-Bench","swe-bench",11,"SWE-bench 是用真實 GitHub issue 評估程式修復能力的基準，常分成 Verified、Lite 等版本。它反映模型與 agent 是否能讀懂程式庫、定位 bug、修改測試並維持可重現性，也常被用來比較 coding agent 的成本與效率。","SWE-bench is a benchmark for measuring whether models and coding agents can fix real GitHub issues end to end. Its variants, including Verified and Lite, are used to compare bug localization, test-driven edits, and the cost of agentic repair workflows.",[12,21,29,37,45,52,60],{"id":13,"slug":14,"title":15,"summary":16,"category":17,"image_url":18,"cover_image":18,"language":19,"created_at":20},"8d3f770c-adc7-454f-957f-8f98633729cf","llm-benchmarks-2026-pick-right-test-zh","LLM 基準別對職能，不再看單一分數","把 2026 LLM 基準分數翻成工作適配度，並附可直接複製的自訂評測模板。","industry","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782620302423-aziz.png","zh","2026-06-28T04:17:57.382761+00:00",{"id":22,"slug":23,"title":24,"summary":25,"category":26,"image_url":27,"cover_image":27,"language":19,"created_at":28},"09fe28b5-aae5-4bac-b3bd-9a261e4c99a1","mimo-v2-flash-openrouter-benchmarks-pricing-zh","MiMo-V2-Flash 直衝開源 SWE-bench","Xiaomi 的 MiMo-V2-Flash 以 309B MoE 架構登場，OpenRouter 標價每 1M Token 只要 $0.10 \u002F $0.30，並在開源 SWE-bench 分數上衝到前段班。","model-release","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781321565467-96el.png","2026-06-13T03:32:17.367685+00:00",{"id":30,"slug":31,"title":32,"summary":33,"category":34,"image_url":35,"cover_image":35,"language":19,"created_at":36},"d68bf7ed-a36e-4639-bcf0-aa15291a10ce","2026-domain-specific-llm-benchmarks-map-zh","2026 垂直 LLM 基準地圖","Kili Technology 整理 2026 垂直 LLM 基準，涵蓋醫療、法律、金融、程式與資安。重點是通用榜單已不足以分出模型差距，採購與合規開始看專業評測。","research","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779649569393-il2i.png","2026-05-24T19:05:41.374014+00:00",{"id":38,"slug":39,"title":40,"summary":41,"category":42,"image_url":43,"cover_image":43,"language":19,"created_at":44},"21805270-d3b7-4155-8e3f-2c650cef3315","tested-devin-10-tasks-finished-3-zh","我測了 Devin 10 個任務，只做完 3 個","Devin 在 SWE-bench 只拿 13.86%，實測 10 個真實任務也只完成 3 個。這篇拆解它在哪些工作能用、哪些地方會亂掉。","ai-agent","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775167981590-g2tr.png","2026-04-02T22:12:37.165364+00:00",{"id":46,"slug":47,"title":48,"summary":49,"category":26,"image_url":50,"cover_image":50,"language":19,"created_at":51},"c679b51f-194a-463b-87fc-7695256ff752","mimo-v2-pro-vs-omni-vs-flash-2026-zh","MiMo V2 Pro、Omni、Flash 怎麼選","MiMo 2026 三款模型分工很清楚：Flash 主打開源與 coding，Pro 提供 1M context，Omni 則處理圖像、音訊與影片。這篇直接比 benchmark、價格與適用場景。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775092816164-mhzf.png","2026-04-02T01:18:43.576128+00:00",{"id":53,"slug":54,"title":55,"summary":56,"category":26,"image_url":57,"cover_image":58,"language":19,"created_at":59},"9e1044b4-946d-47fe-9e2a-c2ee032e1164","xiaomi-mimo-v2-pro-1t-moe-agents-zh","小米 MiMo-V2-Pro 登場：1T MoE 模型","小米推出 MiMo-V2-Pro，總參數超過 1T、每 token 啟用 42B，還有 1M context。SWE-bench 成績逼近 Claude Sonnet 4.6，價格卻低很多。",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1774597397882-x4i9.png","2026-03-28T03:06:19.002353+00:00",{"id":61,"slug":62,"title":63,"summary":64,"category":26,"image_url":57,"cover_image":65,"language":19,"created_at":66},"cda76b92-d209-4134-86c1-a60f5bc7b128","xiaomi-mimo-trio-agents-robots-voice-zh","小米 MiMo 三模型瞄準代理、機器人與語音","小米一次推出三款 MiMo AI 模型，涵蓋代理、多模態與語音。MiMo-V2-Pro 以超過 1 兆參數、100 萬 token 上下文，逼近 Claude Opus 4.6 的表現。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1774498800835-3s4y.png","2026-03-28T03:05:08.779489+00:00",13]