[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-模型評測":3},{"tag":4,"articles":10,"peer_article_count":11},{"id":5,"name":6,"slug":6,"article_count":7,"description_zh":8,"description_en":9},"5d1906c5-3a61-4806-9d99-0dffc1aa881f","模型評測",3,"模型評測關注的是 AI 模型在知識、推理、長上下文與真實任務上的表現，也包括 benchmark 是否可信。從分數爭議、提示詞對成績的影響，到不同模型在同一測試上的差異，這類內容幫助開發者判斷模型能否真正上線。","Model evaluation covers how AI systems perform on knowledge, reasoning, long-context tasks, and applied workloads, as well as whether benchmark results are trustworthy. It includes score disputes, prompt sensitivity, and cross-model comparisons that help developers judge real deployment readiness.",[],5]