[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-self-distillation-shrinks-output-diversity-zh":3,"article-related-self-distillation-shrinks-output-diversity-zh":30,"series-research-a875d002-f6f0-4139-abc1-f1602bc42fee":75},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":22,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":29},"a875d002-f6f0-4139-abc1-f1602bc42fee","self-distillation-shrinks-output-diversity-zh","自蒸餾會縮小模型多樣性","\u003Cp data-speakable=\"summary\">這篇論文指出，自蒸餾能拉高 pass@1，卻會壓縮輸出多樣性，讓模型在分布外情境更脆弱。\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong>研究機構\u003C\u002Fstrong>：arXiv 摘要未明確標註\u003C\u002Fli>\u003Cli>\u003Cstrong>核心數據\u003C\u002Fstrong>：摘要無公開 benchmark 數字\u003C\u002Fli>\u003Cli>\u003Cstrong>突破點\u003C\u002Fstrong>：把自蒸餾視為偏置更新\u003C\u002Fli>\u003C\u002Ful>\u003Cp>\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.26091\">Self-Distillation Can Shrink Model Diversity\u003C\u002Fa> 這篇論文在提醒一件事：模型看起來更準，不代表它真的更會想。作者討論的是 on-policy self-distillation，也就是模型同時扮演 teacher 和 student，並用自己抽樣出的正確示範來訓練自己。這種做法很吸引人，因為它有機會提升 pass@1，還不用另外找一個外部老師模型。\u003C\u002Fp>\u003Cp>但代價也很直接。當訓練訊號一直回灌到模型自己偏好的路徑，輸出分布可能會越來越窄。對開發者來說，這不是小問題。因為很多系統真正需要的，不是第一個答案看起來漂亮，而是能產生多個不同但都合理的候選解。\u003C\u002Fp>\u003Ch2>這篇論文在處理什麼痛點\u003C\u002Fh2>\u003Cp>作者要解的是一個很現實的訓練取捨：自蒸餾可以改善平均表現，但會不會同時削弱模型的探索能力。論文把焦點放在 on-policy self-distillation，也就是 teacher 會根據一個抽樣得到的正確示範，對 student 的 rollout 給出 \u003Ca href=\"\u002Ftag\u002Ftoken\">token\u003C\u002Fa>-level 的密集回饋。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782369171288-egwp.png\" alt=\"自蒸餾會縮小模型多樣性\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>這個設計的吸引力在於，它可能在不依賴外部 teacher 的情況下，提升 pass@1。也就是說，模型第一個吐出的答\u003Ca href=\"\u002Fnews\u002Frustplus-desktop-unofficial-tools-safer-open-source-zh\">案更\u003C\u002Fa>容易對。但作者指出，這背後可能藏著一個副作用：rollout diversity 下降，pass@k 曲線變平。\u003C\u002Fp>\u003Cp>白話講就是，當你多抽幾次時，模型沒有像你期待的那樣冒出更多不同解法，而是一直重複類似的答案。這對需要多路徑推理、搜尋、合成、或多候選挑選的\u003Ca href=\"\u002Fnews\u002Flibghostty-terminal-substrate-agent-workflows-zh\">工作流\u003C\u002Fa>程，都很關鍵。\u003C\u002Fp>\u003Ch2>方法到底怎麼運作\u003C\u002Fh2>\u003Cp>這篇論文的核心設計，是用「抽樣得到的正確示範」來做自蒸餾。teacher 不是單純看 student 的輸出，而是會在某個正確 rollout 的脈絡下，評估另一個 rollout。接著，這些回饋再回到模型自己的訓練過程裡。\u003C\u002Fp>\u003Cp>作者的理論分析把這件事講得更明確：最優的 self-distillation policy，會用一個 pointwise conditional mutual information 分數去傾斜 base distribution。翻成白話，就是訓練訊號不只在獎勵「答對」，還會把機率質量往那些本來就符合模型偏好的答案推過去。\u003C\u002Fp>\u003Cp>這點和理想的 on-policy \u003Ca href=\"\u002Ftag\u002Freinforcement-learning\">reinforcement learning\u003C\u002Fa> 不一樣。論文指出，理想的 RL 設定會保留等價正確 rollout 之間的機率比例。也就是說，只要答案都對，RL 不一定會把分布壓得那麼尖；但 self-distillation 可能會放大原本就存在的機率差距，讓某些模式越來越占優勢。\u003C\u002Fp>\u003Cp>用工程角度看，這代表模型學到的不只是「什麼可行」，還有「它本來就比較常做什麼」。一旦這種偏好被強化，policy 就會變得更 peaked，也更不愛探索。\u003C\u002Fp>\u003Ch2>論文實際證明了什麼\u003C\u002Fh2>\u003Cp>作者同時做了理論和\u003Ca href=\"\u002Fnews\u002F35-nvidia-ai-supercomputers-turn-europe-into-a-lab-zh\">實驗\u003C\u002Fa>分析。理論上，他們指出 sampled demonstrations 的 self-distillation 會導致偏置更新；實驗上，則觀察到 rollout diversity 下降，pass@k 曲線也會變平。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782369172524-ohoi.png\" alt=\"自蒸餾會縮小模型多樣性\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>論文測了兩類任務：一個是受控的 graph path-finding task，另一個是 science question-answering benchmarks。摘要沒有公開完整 \u003Ca href=\"\u002Ftag\u002Fbenchmark\">benchmark\u003C\u002Fa> 數字，所以這裡不能硬列分數；但摘要明確說，自蒸餾模型在平均表現上可以和 RL 相當，甚至更好，同時 functional diversity 和 semantic diversity 卻更低。\u003C\u002Fp>\u003Cp>這個結果很重要，因為它把「平均分數」和「輸出多樣性」拆開了。只看單一指標時，自蒸餾看起來很有競爭力；但如果你把答案的廣度也算進來，畫面就會變得不一樣。\u003C\u002Fp>\u003Cp>作者還指出，這些自蒸餾模型在需要多樣策略的 out-of-distribution 設定中會失敗。這也合理：當模型過度依賴某一類解法時，遇到分布外輸入，就容易卡在同一套思路裡，錯過原本可行的替代路徑。\u003C\u002Fp>\u003Ch2>對開發者有什麼影響\u003C\u002Fh2>\u003Cp>如果你在做 \u003Ca href=\"\u002Ftag\u002Fagent\">agent\u003C\u002Fa>、推理系統，或任何會一次抽多個候選答案的 pipeline，diversity 不是裝飾品。它會直接影響 beam search、reranking、self-consistency，甚至多樣本選擇到底有沒有價值。\u003C\u002Fp>\u003Cp>這篇論文的警訊是：自蒸餾可能把這些好處吃掉。模型的 pass@1 可能上升，但如果 pass@k 曲線變平，你多給的 \u003Ca href=\"\u002Ftag\u002Finference\">inference\u003C\u002Fa> budget 其實換不到多少額外資訊。對評估設計、訓練策略、以及 leaderboard 解讀方式，這都有直接影響。\u003C\u002Fp>\u003Cp>換句話說，訓練目標有時會掩蓋分布崩縮。當優化一直推模型去強化它本來就喜歡的答案，你可能得到一個更自信、但更單一、也更不耐分布偏移的系統。\u003C\u002Fp>\u003Ch2>這篇論文的限制在哪裡\u003C\u002Fh2>\u003Cp>摘要講得很清楚，這個方法有風險；但它沒有公開完整 benchmark 數字、ablation 細節，或實作層面的更多資訊。所以從 raw 摘要本身，我們只能確認方向，不能精準量化這個 tradeoff 到底有多大。\u003C\u002Fp>\u003Cp>另外，這也不是在否定 self-distillation 本身。摘要同時說了，它在平均表現上可以和 RL 相當或更好。這代表如果你的目標就是把 top-line accuracy 拉高，這條路還是可能有價值。真正的問題是：怎麼保留這些好處，同時避免輸出多樣性塌縮。\u003C\u002Fp>\u003Cp>對實務團隊來說，這篇論文最直接的提醒是：別只看 pass@1。只要你的應用需要多候選推理、穩定的抽樣廣度，或對 out-of-distribution 輸入有韌性，就應該把 diversity metrics 一起納入評估。\u003C\u002Fp>\u003Cp>如果沒有這層檢查，你很可能會以為模型變強了，實際上只是它更會重複自己。\u003C\u002Fp>\u003Cul>\u003Cli>自蒸餾可提升平均表現，但可能壓縮輸出空間。\u003C\u002Fli>\u003Cli>抽樣示範會強化既有偏好，不一定保留多樣正解。\u003C\u002Fli>\u003Cli>多候選與分布外任務，不能只看 pass@1。\u003C\u002Fli>\u003C\u002Ful>","這篇論文指出，自蒸餾能拉高 pass@1，卻會壓縮輸出多樣性，讓模型在分布外情境更脆弱。","arxiv.org","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.26091",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782369171288-egwp.png","research","zh","17884e8b-86d6-431c-8e83-d628bb4d060a",[17,18,19,20,21],"self-distillation","output diversity","pass@k","out-of-distribution","reinforcement learning",[23,24,25],"自蒸餾可能提升 pass@1，但會讓模型答案變得更集中。","抽樣示範的訓練方式，可能強化模型原本就偏好的路徑。","如果應用需要多候選或 OOD 韌性，diversity 不能只當附加指標。",0,"2026-06-25T06:32:26.557584+00:00","2026-06-25T06:32:26.547+00:00","0c35a120-52fc-41fc-afa3-d404eb934158",{"tags":31,"relatedLang":34,"relatedPosts":38},[32],{"name":21,"slug":33},"reinforcement-learning",{"id":15,"slug":35,"title":36,"language":37},"self-distillation-shrinks-output-diversity-en","Self-Distillation Can Shrink Model Diversity","en",[39,45,51,57,63,69],{"id":40,"slug":41,"title":42,"cover_image":43,"image_url":43,"created_at":44,"category":13},"2cc1973d-a7a5-4031-8ed3-e05ca5d335fd","ai-papers-code-music-rare-disease-zh","3 篇 AI 論文：程式、音樂、罕病診斷","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782372792462-buxp.png","2026-06-25T07:32:27.274897+00:00",{"id":46,"slug":47,"title":48,"cover_image":49,"image_url":49,"created_at":50,"category":13},"f9ec6d6f-80a9-4a8e-b3ea-1eb5231aa796","new-nlp-papers-agent-memory-tool-use-zh","新 NLP 論文盯上代理記憶與工具使用","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782371888802-40t8.png","2026-06-25T07:17:39.070441+00:00",{"id":52,"slug":53,"title":54,"cover_image":55,"image_url":55,"created_at":56,"category":13},"80a6e921-dfde-4861-ba61-382e195ec94c","revengebench-reverse-engineering-game-policies-zh","RevengeBench：反推遊戲政策的測試框架","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782368284240-86sh.png","2026-06-25T06:17:29.011751+00:00",{"id":58,"slug":59,"title":60,"cover_image":61,"image_url":61,"created_at":62,"category":13},"978e67d0-1acb-479e-af06-9ead35e4eb74","learning-action-priors-cross-embodiment-manipulation-zh","先學動作先驗，再對齊多模態","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782367376604-ffk9.png","2026-06-25T06:02:29.669069+00:00",{"id":64,"slug":65,"title":66,"cover_image":67,"image_url":67,"created_at":68,"category":13},"4a0bbfe8-be40-4add-95c8-7ed1d38a641f","opsd-user-feedback-training-loop-zh","OPSD 讓你把點擊變訓練","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782335103935-0efp.png","2026-06-24T21:04:40.411616+00:00",{"id":70,"slug":71,"title":72,"cover_image":73,"image_url":73,"created_at":74,"category":13},"a2242009-98d7-409c-9f22-d825a81fef2e","ultraquant-4bit-kv-caching-agents-zh","UltraQuant：4-bit KV 快取加速長代理","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782331375909-uhyy.png","2026-06-24T20:02:32.549463+00:00",[76,81,86,91,96,101,106,111,116,121],{"id":77,"slug":78,"title":79,"created_at":80},"f18dbadb-8c59-4723-84a4-6ad22746c77a","deepmind-bets-on-continuous-learning-ai-2026-zh","DeepMind 押注 2026 連續學習 AI","2026-03-26T08:16:02.367355+00:00",{"id":82,"slug":83,"title":84,"created_at":85},"f4a106cb-02a6-4508-8f39-9720a0a93cee","ml-papers-of-the-week-github-research-desk-zh","每週 ML 論文清單，為何紅到 GitHub","2026-03-27T01:11:39.284175+00:00",{"id":87,"slug":88,"title":89,"created_at":90},"c4f807ca-4e5f-47f1-a48c-961cf3fc44dc","ai-ml-conferences-to-watch-in-2026-zh","2026 AI 研討會投稿時程整理","2026-03-27T01:51:53.874432+00:00",{"id":92,"slug":93,"title":94,"created_at":95},"cf046742-efb2-4753-aef9-caed5da5e32e","adaptive-block-scaled-data-types-zh","IF4：神經網路量化的聰明選擇","2026-03-31T06:00:36.990273+00:00",{"id":97,"slug":98,"title":99,"created_at":100},"53a0dc54-0371-4e40-8d5e-74e94a73840c","geometry-aware-similarity-metrics-for-neural-representations-zh","超越距離測量：用微分幾何重新理解神經網路","2026-03-31T06:01:01.241968+00:00",{"id":102,"slug":103,"title":104,"created_at":105},"fee7d472-a775-4b1d-bbc2-1e8bca1bbf8b","on-the-fly-repulsion-in-the-contextual-space-for-rich-divers-zh","讓AI繪圖更有創意：用排斥力提升生成多樣性","2026-03-31T06:01:25.439673+00:00",{"id":107,"slug":108,"title":109,"created_at":110},"a9901203-d69b-447b-8854-15d14eab32b4","vision-aided-beam-prediction-cnn-eca-zh","影像輔助波束預測升級 CNN","2026-04-01T10:00:25.8073+00:00",{"id":112,"slug":113,"title":114,"created_at":115},"b55e7dd4-0a24-4b3d-804d-b0309a03f498","triple-band-fss-mimo-antenna-sub-6-ghz-zh","三頻 FSS MIMO 天線瞄準 sub-6 GHz","2026-04-01T13:18:36.857305+00:00",{"id":117,"slug":118,"title":119,"created_at":120},"f68290bd-e7f3-4b30-ba22-dcd4e0130a66","openclaw-1299-repos-eight-weeks-analysis-zh","OpenClaw 1299 個 Repo 的資料解讀","2026-04-02T05:03:45.208411+00:00",{"id":122,"slug":123,"title":124,"created_at":125},"ed9f80eb-eb02-4d35-8ad4-0ddf428751dd","beam-coherence-aware-combining-mmwave-mimo-zh","毫米波 MIMO 的雙階合併法","2026-04-02T05:27:26.897188+00:00"]