[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-llm-benchmarks":3},{"tag":4,"articles":11,"peer_article_count":42},{"id":5,"name":6,"slug":7,"article_count":8,"description_zh":9,"description_en":10},"ee654d61-465d-4eec-8060-5b4afb694d7b","LLM benchmarks","llm-benchmarks",3,"LLM 基準測試用來比較模型在知識、數學推理、幻覺率、長上下文與對話品質上的表現，像 BenchLM、AIME 這類榜單常反映模型升級的實際差異，也影響選型與部署判斷。","LLM benchmarks compare models across knowledge, math reasoning, hallucination rate, long-context handling, and chat quality. Results from tests like BenchLM or AIME help teams judge real capability, not just model size or release hype.",[12,21,28,35],{"id":13,"slug":14,"title":15,"summary":16,"category":17,"image_url":18,"cover_image":18,"language":19,"created_at":20},"8d3f770c-adc7-454f-957f-8f98633729cf","llm-benchmarks-2026-pick-right-test-zh","LLM 基準別對職能，不再看單一分數","把 2026 LLM 基準分數翻成工作適配度，並附可直接複製的自訂評測模板。","industry","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782620302423-aziz.png","zh","2026-06-28T04:17:57.382761+00:00",{"id":22,"slug":23,"title":24,"summary":25,"category":17,"image_url":26,"cover_image":26,"language":19,"created_at":27},"d686fcc4-b444-4a8e-9c79-477ec86b4c2d","open-source-llms-run-locally-2026-zh","10 款可本地跑的開源 LLM，2026 這樣選","10 款可本地部署的開源 LLM，從 8GB 到 136GB VRAM 都有對應選擇，適合比對推理、寫程式、長上下文與代理任務。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781222590157-593t.png","2026-06-12T00:02:33.72144+00:00",{"id":29,"slug":30,"title":31,"summary":32,"category":17,"image_url":33,"cover_image":33,"language":19,"created_at":34},"7c188c00-8556-4f77-8a36-ac458322ad19","llm-stats-ai-benchmarks-compare-zh","5 個最值得先看的 AI 基準","300+ 個 AI 基準集中比較，先看 5 項就能判斷模型在推理、寫碼、視覺與工具呼叫上的實力。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780973269412-nyhe.png","2026-06-09T02:47:22.6013+00:00",{"id":36,"slug":37,"title":38,"summary":39,"category":17,"image_url":40,"cover_image":40,"language":19,"created_at":41},"a7bca854-a4d9-4616-b651-e5d732a63255","5-llm-benchmarks-for-business-buyers-2026-zh","5 個 LLM 基準測試","5 個基準測試幫你判斷模型強弱、看懂分數失真，並選出最適合商務採購的測試。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779161051251-hgbf.png","2026-05-19T03:23:38.737225+00:00",8]