[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-opd-distillation-skills-without-bruteforce-rl-zh":3,"article-related-opd-distillation-skills-without-bruteforce-rl-zh":30,"series-research-5172bfc7-34c8-4477-a177-ffa615497ecf":74},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":22,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":29},"5172bfc7-34c8-4477-a177-ffa615497ecf","opd-distillation-skills-without-bruteforce-rl-zh","OPD 讓你把技能蒸餾進模型","\u003Cp data-speakable=\"summary\">OPD 是把強模型的能力，搬進你自己的後訓練流程的一種實作法。\u003C\u002Fp>\u003Cp>我最近一直在看後訓練怎麼做，越看越火大。很多團隊一遇到模型要更會推理、會用\u003Ca href=\"\u002Fnews\u002Fgoogle-deepmind-turns-science-into-tools-zh\">工具\u003C\u002Fa>、會守住領域行為，第一反應還是：丟更多 RL、燒更多 sampling、然後祈禱 reward 不要發瘋。這招不是完全沒用，但很常把人帶進一個很熟的坑：訓練不穩、回退奇怪、算力燒掉一堆，最後只是把別人早就會的東西，自己再學一次。\u003C\u002Fp>\u003Cp>我\u003Ca href=\"\u002Fnews\u002Fonchain-insurance-proof-institutional-tokenization-test-zh\">真正\u003C\u002Fa>開始注意 OPD，是因為它把一個我很想要、但以前總覺得卡卡的方向講清楚了：不要把 \u003Ca href=\"\u002Ftag\u002Freinforcement-learning\">reinforcement learning\u003C\u002Fa> 當成唯一正解，而是想辦法把更強的 policy 能力，穩穩搬到目標模型上。這不是魔法，也不是白拿，但它比「每次都從零靠探索硬拚」務實太多。\u003C\u002Fp>\u003Cp>這篇拆解的觸發點，是知乎上的 \u003Ca href=\"https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F2037285722151989443\">On-Policy Distillation (OPD)：起源、发展路线与当今现状\u003C\u002Fa>。它把 OPD 講成後訓練裡的能力遷移方法，還順手點到 \u003Ca href=\"https:\u002F\u002Fqwenlm.github.io\u002F\">Qwen\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FMiMo-AI\u002FMiMo\">MiMo-V2\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fwww.deepseek.com\u002F\">DeepSeek\u003C\u002Fa> 這些公開\u003Ca href=\"\u002Fnews\u002Fcloudflare-technology-partner-program-integrations-zh\">技術\u003C\u002Fa>報告的脈絡。原文沒有給 bookmark 或 view 數，我就不亂掰。\u003C\u002Fp>\u003Ch2>OPD 其實是在說：別再逼模型自己摸索了\u003C\u002Fh2>\u003Cblockquote>On-Policy Distillation 正在成为后训练中的重要能力迁移工具。\u003C\u002Fblockquote>\u003Cp>白話翻譯就是：與其叫一個較小或目標模型自己試錯，試到會為止，不如讓它在接近真實推理情境的分佈裡，直接學一個更強 policy 的做法。這裡的 on-policy，不是學術裝飾詞，是重點。學生看到的不是靜態標註資料，而是目前這個 policy 狀態下，真的會出現的輸出。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782730101413-5wjx.png\" alt=\"OPD 讓你把技能蒸餾進模型\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>我之前在調一個偏結構化推理的模型時，就很有感。純 SFT 會讓它變得很有禮貌，但不會變強；RL 有時候會突然變聰明，有時候又把行為搞得很髒。問題常常不是模型不會，而是它一離開那個訓練設定，就把老師的好習慣丟掉了。OPD 吸引我的點就在這：它不是只教答案，它是在教「在這種情境下怎麼做事」。\u003C\u002Fp>\u003Cp>實操上，我會這樣想：如果你手上已經有一個強 teacher，就不要把它的輸出當成死板的 gold label。你要在跟部署相近的 prompt 格式、工具上下文、任務混合裡去產生它的 trace。然後拿這些 policy-conditioned traces 去訓練學生。若你本來就有 preference data，也不是不能混，但核心要守住一件事：資料要貼著 live policy distribution，不要退回成凍結的離線樣本。\u003C\u002Fp>\u003Cul>\u003Cli>teacher 已經會你要的技能時，OPD 很適合。\u003C\u002Fli>\u003Cli>RL 太吵、太慢、太貴時，OPD 通常更好落地。\u003C\u002Fli>\u003Cli>你在意的是實際 prompt 下的行為一致性，不只是 benchmark 分數時，OPD 特別有用。\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>真正的轉變不是「訓練」，是「搬運」\u003C\u002Fh2>\u003Cp>我覺得 OPD 最有價值的地方，是它把後訓練的心態從「讓模型自己發現」拉回「把既有能力搬過去」。這差很多。前者很浪漫，後者很像工程。前者假設模型要靠 reward optimization 自己撞出所有有用策略，後者直接問：既然更好的 policy 已經存在，幹嘛假裝學生要從零重想一次？\u003C\u002Fp>\u003Cp>原文把這個趨勢連到 \u003Ca href=\"https:\u002F\u002Fqwenlm.github.io\u002F\">Qwen3\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FMiMo-AI\u002FMiMo\">MiMo-V2\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fwww.deepseek.com\u002F\">DeepSeek-V4\u003C\u002Fa> 這類公開報告。我不硬塞那些報告沒寫的細節，但整個方向其實很明顯：現在很多團隊不是只靠 RL 硬推，而是在 pretraining 之後，更認真地做 capability transfer。這代表後訓練已經不是單一演算法問題，而是系統設計問題。\u003C\u002Fp>\u003Cp>這對開發者的意思很直接：你做的不是「調一個 reward function 然後祈禱」，而是設計一條 transfer pipeline。哪個 teacher、什麼 prompt、哪些 rollout、怎麼 filter、學生的 objective 怎麼設，這些都要一起想。teacher 不只是 reference model，它是訓練基礎設施的一部分。\u003C\u002Fp>\u003Cp>我喜歡這種說法，因為它比較誠實。大多數團隊根本不需要模型「發現」怎麼回答一個 coding prompt 或 domain question，他們需要的是：在自己的限制下，穩定重現一個已知好行為。這就是搬運問題，不是發明問題。OPD 比起那些把 RL 講得像修仙的說法，貼近現實多了。\u003C\u002Fp>\u003Cp>實操寫法很簡單：先把你要搬的能力寫死，不要寫成「更會推理」這種空話。要寫成「多步數學時少出無效中間步驟」或「工具選擇時避免重複呼叫」這種可觀測行為。接著檢查 teacher 能不能在學生未來會看到的 prompt regime 裡，真的吐出這種行為。如果不行，先修 teacher prompting，再碰 student。\u003C\u002Fp>\u003Ch2>on-policy 重要，是因為 off-policy 很容易過期\u003C\u002Fh2>\u003Cp>「on-policy」這四個字看起來像教科書術語，但它其實在講一個很實際的痛點：學生不是從過去某批老資料學，而是從目前這個 policy 分佈下產生的資料學。這種差異很小，debug 的時候卻超致命。你以為資料沒問題，結果只是資料在騙你。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782730098664-ce3k.png\" alt=\"OPD 讓你把技能蒸餾進模型\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>我在 instruction tuning 上也踩過類似的雷。資料集做得很漂亮，訓練看起來也順，結果一上 production，prompt 形狀稍微變一下，模型就開始失靈。現在把這件事放大到 policy 每幾千 step 就變一次，stale data 直接變成風險。OPD 想做的，就是讓 teacher-student loop 跟目前的行為邊界保持同步。\u003C\u002Fp>\u003Cp>當然，這裡有代價。on-policy collection 比離線蒸餾更貴，也更麻煩。你要有生成基礎設施、過濾機制，還要想辦法不要讓爛 sample 汙染學生。但這個成本換來的是 relevance。學生看到的例子，會更接近它實際部署時會遇到的行為，這也是為什麼這方法現在在後訓練討論裡一直冒出來。\u003C\u002Fp>\u003Cp>實操寫法：不要一次 dump 一大包資料就收工。做一個 rolling generator，從目前任務分佈抽 prompt，跑 teacher 或 expert policy，做品質過濾，再拿新鮮 trace 訓練學生。如果你還要混老資料，請把它當錨點，不要當主菜。學生 policy 變動很大時，rollout 也要定期刷新。\u003C\u002Fp>\u003Cul>\u003Cli>新鮮 rollouts 能降低訓練與部署之間的分佈落差。\u003C\u002Fli>\u003Cli>過期的 teacher traces 會默默卡住上限。\u003C\u002Fli>\u003Cli>rolling generation 比較貴，但通常比較不會騙人。\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>OPD 卡在 SFT 和 RL 中間，剛好有用\u003C\u002Fh2>\u003Cp>很多人會一直重複發現 OPD，原因其實很土：它站在一個很實用的中間地帶。SFT 穩，但常常只是在表面上變聰明；RL 可以再往前推，但通常又吵又難控。OPD 把兩邊真的有用的部分拿來：imitation 的方向感，加上 online training 的 policy awareness。\u003C\u002Fp>\u003Cp>這個中間地帶特別適合你已經信任 teacher 的時候。如果 teacher 是領域專家模型、較強的通用模型，或你自己精調過的內部 policy，那問題通常不是探索，而是壓縮。你要把行為塞進一個更便宜、更快的模型裡，還不能掉太多訊號。這時候蒸餾本來就合理，on-policy 版本只是讓壓縮更貼近真實使用。\u003C\u002Fp>\u003Cp>我猜這就是為什麼 OPD 會在後期模型工作裡一直出現。當團隊有了一個還不錯的 base model，瓶頸通常不再是「模型根本不會」，而是「它在限制條件下能不能穩穩照你要的方式做事」。如果產品依賴的是一致輸出、工具呼叫、結構化推理，這條路就很值得試。\u003C\u002Fp>\u003Cp>實操寫法：如果你在 SFT、RL、OPD 之間猶豫，先問自己現在的失敗型態是什麼。模型如果是完全不懂，先 SFT。模型如果懂但行為不穩、需要 policy shaping，OPD 值得先試。只有在你真的需要模型自己找新策略，而且 reward 訊號可信的時候，RL 才該站到主位。我不會再預設 RL 要扛所有後訓練工作。\u003C\u002Fp>\u003Ch2>我會把 OPD 當成一條流水線，不是口號\u003C\u002Fh2>\u003Cp>這裡是很多 writeup 最愛講得很玄、但我覺得最該講白的地方：OPD 是 pipeline，不是 slogan。你要它有用，就得做一條很無聊但很穩的 loop。teacher 先產生輸出，資料要在目前 policy 分佈下收集，接著過濾垃圾，學生吃剩下的 trace，然後再跑一次。\u003C\u002Fp>\u003Cp>原文更大的意思，是這種 capability transfer 會越來越像後訓練的核心工具。我同意，而且原因很工程：這個 loop 比純 reward maximization 更好理解，尤其當你的任務本來就很清楚時。你可以直接看 teacher 的輸出、可以 audit failure、可以改 prompt template，然後立刻看到資料變化。\u003C\u002Fp>\u003Cp>這個「可檢查」的特性很重要。我的經驗是，很多 training run 的 aggregate metrics 看起來不錯，sample 裡卻藏了一堆垃圾。OPD 剛好相反，sample 本身就是產品。teacher 爛，student 就學壞；filter 爛，noise 就被吞下去；rollout 分佈錯，整條線都會歪。\u003C\u002Fp>\u003Cp>實操寫法：先做一條很樸素的 production-like loop。第一，定 prompt schema。第二，在同一 schema 下生成 teacher rollouts。第三，用明確規則做 score 或 filter。第四，訓練 student。第五，在跟部署形狀相近的 held-out prompts 上重評估。如果 student 分數有上升，但 sample 品質很醜，先修資料管線，不要急著加更多訓練技巧。\u003C\u002Fp>\u003Ch2>為什麼現在很多模型報告都往這裡靠\u003C\u002Fh2>\u003Cp>原文提到 \u003Ca href=\"https:\u002F\u002Fqwenlm.github.io\u002F\">Qwen3\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FMiMo-AI\u002FMiMo\">MiMo-V2\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fwww.deepseek.com\u002F\">DeepSeek-V4\u003C\u002Fa>，我不把這些名字當神諭，我把它們當訊號。大型模型團隊正在收斂到同一個結論：後訓練不只是把 RL 再擠一點，而是要系統化地把能力從強 policy 搬到更便宜、更窄、或更好部署的模型上。\u003C\u002Fp>\u003Cp>這很合理，因為你真的管過 production model pipeline，就知道沒有人想每次改進都靠巨大的探索預算。你要的是一個可重複的方法，把行為跨模型、跨訓練階段地搬過去。OPD 至少在概念上提供了這件事。它不一定是單一演算法，更像是一種後訓練設計模式。\u003C\u002Fp>\u003Cp>好處是這種模式可以跟著成熟度一起長大。早期你可以拿它把強 teacher 壓進小 student；後期你可以拿它去補 domain shift、tool change、prompt change 之後的行為更新。換句話說，它不是只能做一次的壓縮招，而是可以變成維護流程的一部分。\u003C\u002Fp>\u003Cp>實操寫法：把 OPD 當成可重用的後訓練 primitive。把 teacher version、prompt format、rollout policy、filter rules、student checkpoint 全部記下來。等你一個月後回頭看，應該能直接知道哪裡變了，而不是翻 log 翻到懷疑人生。如果做不到，這流程就還不夠穩。\u003C\u002Fp>\u003Ch2>可抄的模板\u003C\u002Fh2>\u003Cpre>\u003Ccode># On-Policy Distillation playbook\n\n## Goal\nTransfer one concrete capability from a stronger teacher policy into a target student model without relying on pure RL exploration.\n\n## When I use this\n- The teacher already performs the task well\n- I need better behavior under realistic prompts, not just benchmark answers\n- RL is too noisy, too slow, or too expensive as the main training path\n- I care about keeping training aligned with the current policy distribution\n\n## Inputs\n- Teacher model: {{teacher_model_name}}\n- Student model: {{student_model_name}}\n- Task family: {{task_family}}\n- Prompt schema: {{prompt_schema}}\n- Rollout budget: {{rollout_budget}}\n- Filter rules: {{filter_rules}}\n- Evaluation set: {{eval_set}}\n\n## OPD loop\n1. Sample prompts from the current task distribution.\n2. Run the teacher on the exact prompt schema the student will see.\n3. Collect teacher rollouts from the live policy distribution.\n4. Filter out low-quality, malformed, or off-task samples.\n5. Train the student on the remaining traces.\n6. Re-evaluate on held-out prompts.\n7. Refresh rollouts and repeat if the prompt distribution or student policy has shifted.\n\n## Practical rules\n- Keep the teacher and student prompt format identical.\n- Do not train on stale traces if the policy has drifted.\n- Prefer explicit filters over vague \"quality\" judgments.\n- If the student gets worse on deployment-shaped prompts, fix the rollout distribution before changing the loss.\n- Use OPD for transfer and consistency; use RL only when you truly need discovery.\n\n## Minimal training note\n- Teacher-generated traces are the supervision signal.\n- On-policy generation keeps the data closer to the behavior you want in production.\n- The whole point is to move known-good behavior, not to rediscover it from scratch.\n\n## Review checklist\n- [ ] Teacher is stronger on the target capability\n- [ ] Prompt schema matches deployment\n- [ ] Rollouts are fresh enough to avoid drift\n- [ ] Filters are explicit and reproducible\n- [ ] Student improves on held-out, production-shaped prompts\n- [ ] Failure cases are logged and inspected\n\n## One-line summary\nOPD is a repeatable way to move capability from a stronger policy into a student model while keeping the training data close to real usage.\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>我刻意把這份模板寫得很樸素，因為真正能活下來的流程通常都不花俏。你要改得更漂亮當然可以，但核心 loop 不要動：current prompts、fresh teacher rollouts、explicit filtering、student training、repeat。這就是 OPD 真正值錢的地方。\u003C\u002Fp>\u003Cp>來源致謝：原始討論來自 \u003Ca href=\"https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F2037285722151989443\">知乎這篇 OPD 文章\u003C\u002Fa>，我也參考了 \u003Ca href=\"https:\u002F\u002Fqwenlm.github.io\u002F\">Qwen\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FMiMo-AI\u002FMiMo\">MiMo\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fwww.deepseek.com\u002F\">DeepSeek\u003C\u002Fa> 的公開資料脈絡。上面這份拆解裡，概念框架是我自己的整理；實作細節請回頭對照原文與各模型報告。\u003C\u002Fp>","我拆 On-Policy Distillation 的做法，整理成可直接套用的後訓練模板，少碰硬拼 RL。","zhuanlan.zhihu.com","https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F2037285722151989443",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782730101413-5wjx.png","research","zh","3efb3e20-b2da-4abd-b442-3babd8b0ed1e",[17,18,19,20,21],"OPD","distillation","post-training","RL","on-policy",[23,24,25],"OPD 的重點不是讓模型自己摸索，而是把強 policy 的能力搬進學生模型。","on-policy 的價值在於資料跟著當前行為分佈走，避免舊資料把訓練帶偏。","把 OPD 當成流水線：teacher、rollout、filter、student、repeat，才有機會真的落地。",0,"2026-06-29T10:47:57.457072+00:00","2026-06-29T10:47:57.447+00:00","0c35a120-52fc-41fc-afa3-d404eb934158",{"tags":31,"relatedLang":33,"relatedPosts":37},[32],{"name":18,"slug":18},{"id":15,"slug":34,"title":35,"language":36},"opd-distillation-skills-without-bruteforce-rl-en","OPD lets you distill skills without brute-force RL","en",[38,44,50,56,62,68],{"id":39,"slug":40,"title":41,"cover_image":42,"image_url":42,"created_at":43,"category":13},"6f5be102-5764-44f1-ab3f-722fc5c32c23","google-deepmind-turns-science-into-tools-zh","Google DeepMind把AI變研究工具","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782721105628-g4op.png","2026-06-29T08:17:57.716568+00:00",{"id":45,"slug":46,"title":47,"cover_image":48,"image_url":48,"created_at":49,"category":13},"c649adb7-c8ae-4ade-a092-2c0d53beeb71","measuring-llm-behavior-portability-zh","LLM 行為不一定可移植","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782717472977-na8g.png","2026-06-29T07:17:29.597679+00:00",{"id":51,"slug":52,"title":53,"cover_image":54,"image_url":54,"created_at":55,"category":13},"637c3016-e364-4bfe-904e-5e60a18ed678","prompt-injection-ai-security-problem-zh","Prompt injection 已是 AI 資安問題","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782716580916-m1nm.png","2026-06-29T07:02:36.173749+00:00",{"id":57,"slug":58,"title":59,"cover_image":60,"image_url":60,"created_at":61,"category":13},"118680f5-6212-4535-986a-50c4a0e71699","solver-choice-nash-equilibrium-selection-zh","求解器會改變納許均衡","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782714784181-t42d.png","2026-06-29T06:32:31.062308+00:00",{"id":63,"slug":64,"title":65,"cover_image":66,"image_url":66,"created_at":67,"category":13},"f303e5bb-372c-48f6-bfc3-f7a73a1e678b","proper-positive-only-learning-characterization-zh","正向樣本學習的完整界線","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782713880760-9ang.png","2026-06-29T06:17:33.749889+00:00",{"id":69,"slug":70,"title":71,"cover_image":72,"image_url":72,"created_at":73,"category":13},"89159fcf-2fbb-4b72-9e05-7928e609a925","dexcompose-reuses-dexterous-policies-across-tasks-zh","DexCompose 讓手部技能可重用","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782712975186-mj1e.png","2026-06-29T06:02:28.144402+00:00",[75,80,85,90,95,100,105,110,115,120],{"id":76,"slug":77,"title":78,"created_at":79},"f18dbadb-8c59-4723-84a4-6ad22746c77a","deepmind-bets-on-continuous-learning-ai-2026-zh","DeepMind 押注 2026 連續學習 AI","2026-03-26T08:16:02.367355+00:00",{"id":81,"slug":82,"title":83,"created_at":84},"f4a106cb-02a6-4508-8f39-9720a0a93cee","ml-papers-of-the-week-github-research-desk-zh","每週 ML 論文清單，為何紅到 GitHub","2026-03-27T01:11:39.284175+00:00",{"id":86,"slug":87,"title":88,"created_at":89},"c4f807ca-4e5f-47f1-a48c-961cf3fc44dc","ai-ml-conferences-to-watch-in-2026-zh","2026 AI 研討會投稿時程整理","2026-03-27T01:51:53.874432+00:00",{"id":91,"slug":92,"title":93,"created_at":94},"cf046742-efb2-4753-aef9-caed5da5e32e","adaptive-block-scaled-data-types-zh","IF4：神經網路量化的聰明選擇","2026-03-31T06:00:36.990273+00:00",{"id":96,"slug":97,"title":98,"created_at":99},"53a0dc54-0371-4e40-8d5e-74e94a73840c","geometry-aware-similarity-metrics-for-neural-representations-zh","超越距離測量：用微分幾何重新理解神經網路","2026-03-31T06:01:01.241968+00:00",{"id":101,"slug":102,"title":103,"created_at":104},"fee7d472-a775-4b1d-bbc2-1e8bca1bbf8b","on-the-fly-repulsion-in-the-contextual-space-for-rich-divers-zh","讓AI繪圖更有創意：用排斥力提升生成多樣性","2026-03-31T06:01:25.439673+00:00",{"id":106,"slug":107,"title":108,"created_at":109},"a9901203-d69b-447b-8854-15d14eab32b4","vision-aided-beam-prediction-cnn-eca-zh","影像輔助波束預測升級 CNN","2026-04-01T10:00:25.8073+00:00",{"id":111,"slug":112,"title":113,"created_at":114},"b55e7dd4-0a24-4b3d-804d-b0309a03f498","triple-band-fss-mimo-antenna-sub-6-ghz-zh","三頻 FSS MIMO 天線瞄準 sub-6 GHz","2026-04-01T13:18:36.857305+00:00",{"id":116,"slug":117,"title":118,"created_at":119},"f68290bd-e7f3-4b30-ba22-dcd4e0130a66","openclaw-1299-repos-eight-weeks-analysis-zh","OpenClaw 1299 個 Repo 的資料解讀","2026-04-02T05:03:45.208411+00:00",{"id":121,"slug":122,"title":123,"created_at":124},"ed9f80eb-eb02-4d35-8ad4-0ddf428751dd","beam-coherence-aware-combining-mmwave-mimo-zh","毫米波 MIMO 的雙階合併法","2026-04-02T05:27:26.897188+00:00"]