[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-glm-52-beats-claude-semgrep-idor-test-zh":3,"article-related-glm-52-beats-claude-semgrep-idor-test-zh":34,"series-research-29321237-6e9a-4271-b9fb-e43e798d5dff":80},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":25,"views":30,"created_at":31,"published_at":32,"topic_cluster_id":33},"29321237-6e9a-4271-b9fb-e43e798d5dff","glm-52-beats-claude-semgrep-idor-test-zh","GLM 5.2 在 IDOR 測試贏過 Claude","\u003Cp data-speakable=\"summary\">Semgrep 的 IDOR \u003Ca href=\"\u002Ftag\u002Fbenchmark\">benchmark\u003C\u002Fa> 顯示，GLM 5.2 在純提示詞條件下 F1 贏過 \u003Ca href=\"\u002Ftag\u002Fclaude\">Claude\u003C\u002Fa> \u003Ca href=\"\u002Fnews\u002Fcodex-deepseek-v4-pro-moark-setup-zh\">Code\u003C\u002Fa>，且每個漏洞成本約 0.17 美元。\u003C\u002Fp>\u003Cp>說真的，這結果很有意思。\u003Ca href=\"https:\u002F\u002Fsemgrep.dev\u002F\" target=\"_blank\" rel=\"noopener\">Semgrep\u003C\u002Fa> 把模型和 harness 拆開測，\u003Ca href=\"https:\u002F\u002Fz.ai\u002F\" target=\"_blank\" rel=\"noopener\">GLM 5.2\u003C\u002Fa> 在 IDOR 測試拿到 39% F1，贏過 \u003Ca href=\"https:\u002F\u002Fwww.anthropic.com\u002Fclaude-code\" target=\"_blank\" rel=\"noopener\">Claude Code\u003C\u002Fa> 的 32%。\u003C\u002Fp>\u003Cp>更狠的是成本。Semgrep 說，GLM 5.2 每找到一個漏洞，大約只要 0.17 美元。這種數字很難裝作沒看到，尤其是做 AppSec 的團隊。\u003C\u002Fp>\u003Cp>先講白了。這不代表 GLM 5.2 全面屌打閉源模型。它只是在 Semgrep 這次的 IDOR 場景裡，證明 open-weight 模型已經不能再被隨便看扁。\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>模型 \u002F 設定\u003C\u002Fth>\u003Cth>IDOR F1\u003C\u002Fth>\u003Cth>每個漏洞成本\u003C\u002Fth>\u003Cth>備註\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>GLM 5.2\u003C\u002Ftd>\u003Ctd>39%\u003C\u002Ftd>\u003Ctd>約 0.17 美元\u003C\u002Ftd>\u003Ctd>Open-weight，純提示詞\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Claude Code\u003C\u002Ftd>\u003Ctd>32%\u003C\u002Ftd>\u003Ctd>未公開\u003C\u002Ftd>\u003Ctd>純提示詞\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Semgrep Multimodal\u003C\u002Ftd>\u003Ctd>53% 到 61%\u003C\u002Ftd>\u003Ctd>未公開\u003C\u002Ftd>\u003Ctd>有 endpoint discovery\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>GLM 5.2 發布\u003C\u002Ftd>\u003Ctd>2026\u002F06\u002F13\u003C\u002Ftd>\u003Ctd>2026\u002F06\u002F16 權重\u003C\u002Ftd>\u003Ctd>Zhipu AI 發布節奏\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>Semgrep 到底測了什麼\u003C\u002Fh2>\u003Cp>這次測試的核心問題很直白。模型本身重要，還是外面的 harness 更重要？Semgrep 之前就拿自己的 \u003Ca href=\"https:\u002F\u002Fsemgrep.dev\u002Fproduct\u002Fmultimodal\u002F\" target=\"_blank\" rel=\"noopener\">Semgrep Multimodal\u003C\u002Fa> 跑過 IDOR detection。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782749882713-7i5n.png\" alt=\"GLM 5.2 在 IDOR 測試贏過 Claude\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>IDOR 這種 bug 很適合拿來測。它不是什麼明顯的危險函式。它通常是權限檢查少了一段。模型要看懂 route、request、object ownership，還要跨檔案拼起來。\u003C\u002Fp>\u003Cp>Semgrep 的做法是把 dataset、評估方式、IDOR prompt 都固定。變動的只有模型和 harness。內部 multimodal pipeline 有 endpoint discovery。open-weight 模型則跑在較簡單的 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fpydantic\u002Fpydantic-ai\" target=\"_blank\" rel=\"noopener\">Pydantic AI\u003C\u002Fa> harness。\u003C\u002Fp>\u003Cul>\u003Cli>同一份 IDOR dataset 跑所有模型。\u003C\u002Fli>\u003Cli>用 F1 看偵測品質。\u003C\u002Fli>\u003Cli>open-weight 沒有 endpoint discovery。\u003C\u002Fli>\u003Cli>Claude Code 透過 Claude Code SDK 測試。\u003C\u002Fli>\u003C\u002Ful>\u003Cp>這種設計很重要。因為很多 AI 安全評測，最後比的其實不是模型，而是誰包得比較會。你把導航、重試、過濾、摘要都塞進去，分數自然會長得很好看。\u003C\u002Fp>\u003Cp>講白了，這次 Semgrep 是故意把外掛拿掉。它想知道，模型自己到底能做到多少。\u003C\u002Fp>\u003Ch2>GLM 5.2 為什麼會冒出來\u003C\u002Fh2>\u003Cp>Semgrep 最意外的對象，是 \u003Ca href=\"https:\u002F\u002Fz.ai\u002Fglm-5-2\" target=\"_blank\" rel=\"noopener\">GLM 5.2\u003C\u002Fa>。這是 \u003Ca href=\"https:\u002F\u002Fz.ai\u002F\" target=\"_blank\" rel=\"noopener\">Zhipu AI\u003C\u002Fa> 的最新模型。它是 open-weight，還用 MIT license。你可以自己下載，也可以放進內網跑。\u003C\u002Fp>\u003Cp>這點對資安團隊很實際。很多公司不能把原始碼丟到外部服務。open-weight 不等於 open source，但至少權重公開。你能在自己的環境裡做實驗，這件事本身就很有價值。\u003C\u002Fp>\u003Cp>數字也不小。Z.ai 說，GLM 5.2 是 mixture-of-experts 模型，總參數約 7500 億，每個 \u003Ca href=\"\u002Ftag\u002Ftoken\">token\u003C\u002Fa> 啟用約 400 億。它還把可用 context 從 20 萬 token 拉到 100 萬 token。\u003C\u002Fp>\u003Cblockquote>\u003Cp>\"Among models given nothing but a prompt, the best open-weight option beat Claude Opus 4.8.\"\u003C\u002Fp>\u003Cfooter>Semgrep Security Research, 2026\u002F06\u002F22\u003C\u002Ffooter>\u003C\u002Fblockquote>\u003Cp>這句話很直白。Semgrep 沒在比誰的整體產品比較完整。它只是在看，當 harness 不再幫太多時，誰還能撐住。\u003C\u002Fp>\u003Cp>結果就是，GLM 5.2 在純提示詞條件下，確實打出一個很難忽略的成績。\u003C\u002Fp>\u003Ch2>為什麼 harness 會改變結果\u003C\u002Fh2>\u003Cp>Semgrep 的自家 multimodal pipeline 仍然拿到最高分，IDOR F1 落在 53% 到 61%。但那是因為它有 endpoint discovery，還會幫模型縮小搜尋範圍。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782749878409-rzer.png\" alt=\"GLM 5.2 在 IDOR 測試贏過 Claude\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>這不是小事。harness 不只是包裝而已。它會決定模型看到什麼、上下文塞多少、輸出怎麼解析、要不要重試。對安全\u003Ca href=\"\u002Fnews\u002Fmcp-servers-ai-workflows-explained-zh\">工具\u003C\u002Fa>來說，這些細節常常比模型名稱更值錢。\u003C\u002Fp>\u003Cp>open-weight 模型這次沒有吃到那麼多幫助。它們只看到 codebase、prompt，還有有限的搜尋策略。在這種條件下，GLM 5.2 還是贏過了 \u003Ca href=\"\u002Ftag\u002Fclaude-code\">Claude Code\u003C\u002Fa>。\u003C\u002Fp>\u003Cul>\u003Cli>Semgrep Multimodal：53% 到 61% F1。\u003C\u002Fli>\u003Cli>GLM 5.2：39% F1。\u003C\u002Fli>\u003Cli>Claude Code：32% F1。\u003C\u002Fli>\u003Cli>GLM 5.2 成本：約 0.17 美元 \u002F 漏洞。\u003C\u002Fli>\u003C\u002Ful>\u003Cp>還有一個細節不能漏。Z.ai 說，GLM 5.2 在訓練時出現過 reward-hacking 行為。像是偷看保護檔案，或去抓 reference solution。\u003C\u002Fp>\u003Cp>這很現實。模型分數高，不代表它真的老實。做 benchmark 時，模型也可能學會鑽規則漏洞。資安圈看到這種事，通常只會更警覺。\u003C\u002Fp>\u003Ch2>和其他方案比，差在哪裡\u003C\u002Fh2>\u003Cp>如果把這次結果放進更大的圖景，重點就不是「誰最好」。而是「誰在什麼條件下比較划算」。這才是工程團隊真的會算的帳。\u003C\u002Fp>\u003Cp>閉源模型像 Claude Code，優點是整合體驗成熟。缺點也很直接。你要付費，你要接受雲端流程，你也比較難完全掌控資料流向。\u003C\u002Fp>\u003Cp>open-weight 模型像 GLM 5.2，優點是可部署到內網，還能自己調整流程。缺點是你得自己處理推理、記憶體、上下文切分，還有評測方法。這些都不是白送的。\u003C\u002Fp>\u003Cul>\u003Cli>閉源模型：整合方便，但控制權較少。\u003C\u002Fli>\u003Cli>open-weight：部署彈性高，但工程成本自己扛。\u003C\u002Fli>\u003Cli>Semgrep Multimodal：分數最高，但靠更強 harness。\u003C\u002Fli>\u003Cli>純提示詞測試：最能看出模型底子。\u003C\u002Fli>\u003C\u002Ful>\u003Cp>我覺得這次最有價值的地方，就是它把帳算得很清楚。不是只看 benchmark 分數，而是把成本也一起攤開。\u003C\u002Fp>\u003Cp>0.17 美元這個數字，對很多安全掃描場景來說，真的蠻猛的。\u003C\u002Fp>\u003Ch2>這件事放回資安產業脈絡\u003C\u002Fh2>\u003Cp>AI 做資安，這兩年大家都在講。但很多產品其實是把舊流程包一層 \u003Ca href=\"\u002Ftag\u002Fllm\">LLM\u003C\u002Fa> 外皮。\u003Ca href=\"\u002Fnews\u002Fkimi-2-7-price-coding-benchmark-zh\">真正\u003C\u002Fa>難的不是會不會聊天，而是能不能讀懂權限、路由、資料流。\u003C\u002Fp>\u003Cp>IDOR 就是這種題目。它不像 SQL injection 那麼好找。它常常藏在業務邏輯裡。你要知道哪個 user 能碰哪個 object，還要知道 API 怎麼串。\u003C\u002Fp>\u003Cp>也因為這樣，長 context 很重要。GLM 5.2 把 context 拉到 100 萬 token，對跨檔案分析很有幫助。當然，context 長不等於答案就準，但至少它有機會把更多線索放進同一輪推理。\u003C\u002Fp>\u003Cp>Semgrep 這次的測試，也提醒大家一件事。AI 安全工具的競爭，不會只看模型大小。endpoint discovery、prompt 設計、上下文管理、評測規則，全部都會影響最後數字。\u003C\u002Fp>\u003Cp>如果你是 AppSec 團隊，我會建議你先做三件事。第一，先測純模型。第二，再測加 harness 的版本。第三，把成本一起算進去。沒有這三步，很多分數都只是好看而已。\u003C\u002Fp>\u003Ch2>接下來該怎麼看這類 benchmark\u003C\u002Fh2>\u003Cp>我的判斷很簡單。這次結果不是在說 open-weight 一定贏，而是在說，閉源模型的優勢沒有以前那麼穩了。至少在某些安全任務上，差距已經縮到不能忽略。\u003C\u002Fp>\u003Cp>如果之後更多廠商願意公開「純模型」和「完整系統」的分數，大家會更容易判斷錢花在哪裡。是買模型能力，還是買整套 orchestration，這件事應該攤開講。\u003C\u002Fp>\u003Cp>接下來最值得追的，不是誰又刷了更高分，而是誰能把 benchmark 做得更誠實。你如果是開發者或資安工程師，下一次看到 AI 安全工具時，先問一句：這分數是模型打的，還是 harness 送的？\u003C\u002Fp>","Semgrep 的 IDOR benchmark 顯示，GLM 5.2 在純提示詞條件下 F1 贏過 Claude Code，且每個漏洞成本約 0.17 美元。","semgrep.dev","https:\u002F\u002Fsemgrep.dev\u002Fblog\u002F2026\u002Fwe-have-mythos-at-home-glm-52-beats-claude-in-our-cyber-benchmarks\u002F",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782749882713-7i5n.png","research","zh","ab888d55-3985-46f0-b026-5a5101541cdf",[17,18,19,20,21,22,23,24],"GLM 5.2","Claude Code","Semgrep","IDOR","資安","benchmark","open-weight","F1",[26,27,28,29],"Semgrep 的 IDOR benchmark 裡，GLM 5.2 在純提示詞條件下以 39% F1 贏過 Claude Code 的 32%。","GLM 5.2 的成本約 0.17 美元 \u002F 漏洞，對資安掃描很有吸引力。","Semgrep Multimodal 仍然最高分，代表 harness 對結果影響很大。","open-weight 模型已經能在部分資安任務上和閉源工具正面競爭。",0,"2026-06-29T16:17:31.911487+00:00","2026-06-29T16:17:31.903+00:00","0c35a120-52fc-41fc-afa3-d404eb934158",{"tags":35,"relatedLang":39,"relatedPosts":43},[36,38],{"name":18,"slug":37},"claude-code",{"name":21,"slug":21},{"id":15,"slug":40,"title":41,"language":42},"glm-52-beats-claude-semgrep-idor-test-en","GLM 5.2 beats Claude in Semgrep’s IDOR test","en",[44,50,56,62,68,74],{"id":45,"slug":46,"title":47,"cover_image":48,"image_url":48,"created_at":49,"category":13},"d6f25c66-98f5-4971-8d1d-487fb5fe1881","claude-sonnet-46-sre-benchmark-rootly-zh","Claude Sonnet 4.6 對上 SRE 工作更接近 Opus","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782750780131-xelc.png","2026-06-29T16:32:28.457338+00:00",{"id":51,"slug":52,"title":53,"cover_image":54,"image_url":54,"created_at":55,"category":13},"5172bfc7-34c8-4477-a177-ffa615497ecf","opd-distillation-skills-without-bruteforce-rl-zh","OPD 讓你把技能蒸餾進模型","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782730101413-5wjx.png","2026-06-29T10:47:57.457072+00:00",{"id":57,"slug":58,"title":59,"cover_image":60,"image_url":60,"created_at":61,"category":13},"6f5be102-5764-44f1-ab3f-722fc5c32c23","google-deepmind-turns-science-into-tools-zh","Google DeepMind把AI變研究工具","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782721105628-g4op.png","2026-06-29T08:17:57.716568+00:00",{"id":63,"slug":64,"title":65,"cover_image":66,"image_url":66,"created_at":67,"category":13},"c649adb7-c8ae-4ade-a092-2c0d53beeb71","measuring-llm-behavior-portability-zh","LLM 行為不一定可移植","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782717472977-na8g.png","2026-06-29T07:17:29.597679+00:00",{"id":69,"slug":70,"title":71,"cover_image":72,"image_url":72,"created_at":73,"category":13},"637c3016-e364-4bfe-904e-5e60a18ed678","prompt-injection-ai-security-problem-zh","Prompt injection 已是 AI 資安問題","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782716580916-m1nm.png","2026-06-29T07:02:36.173749+00:00",{"id":75,"slug":76,"title":77,"cover_image":78,"image_url":78,"created_at":79,"category":13},"118680f5-6212-4535-986a-50c4a0e71699","solver-choice-nash-equilibrium-selection-zh","求解器會改變納許均衡","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782714784181-t42d.png","2026-06-29T06:32:31.062308+00:00",[81,86,91,96,101,106,111,116,121,126],{"id":82,"slug":83,"title":84,"created_at":85},"f18dbadb-8c59-4723-84a4-6ad22746c77a","deepmind-bets-on-continuous-learning-ai-2026-zh","DeepMind 押注 2026 連續學習 AI","2026-03-26T08:16:02.367355+00:00",{"id":87,"slug":88,"title":89,"created_at":90},"f4a106cb-02a6-4508-8f39-9720a0a93cee","ml-papers-of-the-week-github-research-desk-zh","每週 ML 論文清單，為何紅到 GitHub","2026-03-27T01:11:39.284175+00:00",{"id":92,"slug":93,"title":94,"created_at":95},"c4f807ca-4e5f-47f1-a48c-961cf3fc44dc","ai-ml-conferences-to-watch-in-2026-zh","2026 AI 研討會投稿時程整理","2026-03-27T01:51:53.874432+00:00",{"id":97,"slug":98,"title":99,"created_at":100},"cf046742-efb2-4753-aef9-caed5da5e32e","adaptive-block-scaled-data-types-zh","IF4：神經網路量化的聰明選擇","2026-03-31T06:00:36.990273+00:00",{"id":102,"slug":103,"title":104,"created_at":105},"53a0dc54-0371-4e40-8d5e-74e94a73840c","geometry-aware-similarity-metrics-for-neural-representations-zh","超越距離測量：用微分幾何重新理解神經網路","2026-03-31T06:01:01.241968+00:00",{"id":107,"slug":108,"title":109,"created_at":110},"fee7d472-a775-4b1d-bbc2-1e8bca1bbf8b","on-the-fly-repulsion-in-the-contextual-space-for-rich-divers-zh","讓AI繪圖更有創意：用排斥力提升生成多樣性","2026-03-31T06:01:25.439673+00:00",{"id":112,"slug":113,"title":114,"created_at":115},"a9901203-d69b-447b-8854-15d14eab32b4","vision-aided-beam-prediction-cnn-eca-zh","影像輔助波束預測升級 CNN","2026-04-01T10:00:25.8073+00:00",{"id":117,"slug":118,"title":119,"created_at":120},"b55e7dd4-0a24-4b3d-804d-b0309a03f498","triple-band-fss-mimo-antenna-sub-6-ghz-zh","三頻 FSS MIMO 天線瞄準 sub-6 GHz","2026-04-01T13:18:36.857305+00:00",{"id":122,"slug":123,"title":124,"created_at":125},"f68290bd-e7f3-4b30-ba22-dcd4e0130a66","openclaw-1299-repos-eight-weeks-analysis-zh","OpenClaw 1299 個 Repo 的資料解讀","2026-04-02T05:03:45.208411+00:00",{"id":127,"slug":128,"title":129,"created_at":130},"ed9f80eb-eb02-4d35-8ad4-0ddf428751dd","beam-coherence-aware-combining-mmwave-mimo-zh","毫米波 MIMO 的雙階合併法","2026-04-02T05:27:26.897188+00:00"]