[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-turboquant-eden-citation-fight-zh":3,"article-related-turboquant-eden-citation-fight-zh":28,"series-research-4242e1bf-4f38-488d-9f92-ccb4f5b70319":81},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":11,"views":25,"created_at":26,"published_at":27,"topic_cluster_id":11},"4242e1bf-4f38-488d-9f92-ccb4f5b70319","turboquant-eden-citation-fight-zh","TurboQuant、EDEN 與引用爭議","\u003Cp>TurboQuant 一開始很吸睛。它主打 KV-cache 6x 壓縮。這種數字很容易讓人停下來看。因為在 LLM 推論裡，記憶體和延遲都很貴。\u003C\u002Fp>\u003Cp>但話題很快歪掉。爭議不在壓縮比，而在引用。EDEN 團隊直接說，TurboQuant 很像舊方法的縮小版。這種說法很刺耳，但也很常見。\u003C\u002Fp>\u003Cp>講白了，這不是只有學術圈在吵。KV-cache 壓縮會影響推論成本。也會影響 token throughput。對跑服務的團隊來說，差一點點就可能差很多。\u003C\u002Fp>\u003Ch2>TurboQuant 到底在做什麼\u003C\u002Fh2>\u003Cp>先講技術本體。\u003Ca href=\"https:\u002F\u002Fdocs.vllm.ai\u002Fen\u002Flatest\u002Fapi\u002Fvllm\u002Fmodel_executor\u002Flayers\u002Fquantization\u002Fturboquant.html\" target=\"_blank\" rel=\"noopener\">TurboQuant\u003C\u002Fa> 是拿來壓縮 transformer 推論時的 KV-cache。KV-cache 會存過去 token 的 key 和 v\u003Ca href=\"\u002Fnews\u002Ftsallis-loss-reasoning-model-training-zh\">al\u003C\u002Fa>ue。這樣模型不用每次重算前文。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777467063814-l8dk.png\" alt=\"TurboQuant、EDEN 與引用爭議\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>問題是，context 越長，cache 就越大。這會吃掉更多顯存。也會讓推論成本往上跑。於是大家開始玩量化，想把記憶體壓下來。\u003C\u002Fp>\u003Cp>TurboQuant 的爭議點，在於它看起來不像全新量化器。批評者說，它比較像舊方法的組合。只是寫法更簡單，說法更好懂。\u003C\u002Fp>\u003Cul>\u003Cli>TurboQuant 主打 KV-cache 壓縮。\u003C\u002Fli>\u003Cli>批評者說它像 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.02170\" target=\"_blank\" rel=\"noopener\">DRIVE\u003C\u002Fa> 的延伸。\u003C\u002Fli>\u003Cli>也有人說它和 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.15421\" target=\"_blank\" rel=\"noopener\">EDEN\u003C\u002Fa> 很接近。\u003C\u002Fli>\u003Cli>爭點集中在 scale 與 residual 設計。\u003C\u002Fli>\u003C\u002Ful>\u003Cp>這裡的重點很現實。方法可以舊，但應用可以新。這沒問題。問題是，你不能把舊骨架包成新發明。尤其在 AI 論文裡，這種包裝太常見了。\u003C\u002Fp>\u003Cp>更麻煩的是，這種方法很難只看標題判斷。壓縮比看起來漂亮，不代表細節也漂亮。scale 怎麼設。b\u003Ca href=\"\u002Fnews\u002Fcoding-agent-skills-form-factor-shift-zh\">it\u003C\u002Fa> 怎麼分。誤差怎麼累積。這些都會影響最後結果。\u003C\u002Fp>\u003Cp>所以，TurboQuant 真正讓人皺眉的，不是它有沒有用。是它到底新在哪裡。這個問題沒講清楚，後面所有數字都會變得很尷尬。\u003C\u002Fp>\u003Ch2>為什麼 EDEN 團隊會不爽\u003C\u002Fh2>\u003Cp>這場爭議的核心，是引用順序。\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.02170\" target=\"_blank\" rel=\"noopener\">DRIVE\u003C\u002Fa> 早在 2021 年就做了 post-rot\u003Ca href=\"\u002Fnews\u002Fwhy-bitcoin-regulation-should-be-treated-as-a-national-secur-zh\">atio\u003C\u002Fa>n 的 distribution-aware quantization。後來的 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.15421\" target=\"_blank\" rel=\"noopener\">EDEN\u003C\u002Fa> 又把這套想法往前推。\u003C\u002Fp>\u003Cp>EDEN 團隊的說法很直接。他們認為 TurboQuant 只是更受限的版本。scale 選擇也比較弱。殘差量化的處理方式，還可能讓誤差更大。\u003C\u002Fp>\u003Cp>這種爭議在 ML 圈不稀奇。但它每次都會讓人火大。因為大家都知道，citation 不是裝飾品。它決定誰被看見。\u003C\u002Fp>\u003Cblockquote>“We were the first to introduce post-rotation distribution-aware quantization in 2021.”\u003C\u002Fblockquote>\u003Cp>這句話出自 HN 討論。意思很清楚。先做的人，想要被正確記住。這很合理。你辛苦寫出來的公式，不該被後來的包裝吃掉。\u003C\u002Fp>\u003Cp>我覺得這裡最刺的是，很多人會把「能跑」和「原創」混在一起。其實兩者差很多。能跑是工程。原創是論文脈絡。\u003C\u002Fp>\u003Cp>如果 TurboQuant 真的只是 EDEN 的變體，那它就應該老實寫成變體。這不是小氣。這是基本職業道德。\u003C\u002Fp>\u003Cp>而且這件事不只關乎名聲。還關乎後面誰會接著做。引用錯了，研究路線也會跟著歪。\u003C\u002Fp>\u003Ch2>數字怎麼看才不會被帶風向\u003C\u002Fh2>\u003Cp>爭議裡最常被拿來講的，是 6x 壓縮。這個數字很大聲。可是大聲不等於公平。你要先看測試條件，再看比較對象。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777467076981-ix4b.png\" alt=\"TurboQuant、EDEN 與引用爭議\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>在 HN 討論裡，有人指出 TurboQuant 的 benchmark 不太對等。像是某些比較用了單核心 CPU。TurboQuant 那邊卻跑在 \u003Ca href=\"https:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fdata-center\u002Fa100\u002F\" target=\"_blank\" rel=\"noopener\">A100\u003C\u002Fa> GPU 上。這種比法很容易把結果弄歪。\u003C\u002Fp>\u003Cp>另外，社群也提到 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fjy-yuan\u002FKIVI\" target=\"_blank\" rel=\"noopener\">KIVI\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.17525\" target=\"_blank\" rel=\"noopener\">HIGGS\u003C\u002Fa>，還有 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.19392\" target=\"_blank\" rel=\"noopener\">Cache Me If You Must\u003C\u002Fa>。這些方法都在不同面向處理 KV-cache 或量化問題。\u003C\u002Fp>\u003Cul>\u003Cli>TurboQuant 主打 6x 壓縮。\u003C\u002Fli>\u003Cli>有說法稱 2-bit EDEN 在某些情境贏過 3-bit TurboQuant。\u003C\u002Fli>\u003Cli>也有人指出 EDEN 的 unbiased 設計更準。\u003C\u002Fli>\u003Cli>benchmark 可能混用了 CPU 和 GPU。\u003C\u002Fli>\u003C\u002Ful>\u003Cp>這些數字放在一起看，味道就變了。若一個方法只是在特定硬體上贏，那它的實用價值就要打折。工程師最怕這種 paper win。\u003C\u002Fp>\u003Cp>再來是 reproducibility。OpenReview 上如果有人重跑不出來，那就很麻煩。因為推論系統不是寫作文。你不能只看圖漂亮。\u003C\u002Fp>\u003Cp>我自己的判斷很簡單。若 2-bit EDEN 在你的情境裡比 3-bit TurboQuant 還穩，那就別被標題騙了。實測比較重要。論文標語不會幫你省顯存。\u003C\u002Fp>\u003Ch2>這件事其實很像 AI 圈老毛病\u003C\u002Fh2>\u003Cp>TurboQuant 不是孤例。AI 圈很常把舊點子重新包裝。換個名字。換個圖表。換個 benchmark。然後大家又開始轉貼。\u003C\u002Fp>\u003Cp>這種現象之所以多，是因為論文和產品節奏太快。研究者想發表。工程師想上線。新創想講故事。三方需求不一樣。\u003C\u002Fp>\u003Cp>結果就是，真正重要的內容常被包裝蓋掉。原始方法可能沒那麼會講故事。可是它可能更完整，也更值得引用。\u003C\u002Fp>\u003Cp>如果你是台灣的開發者，這件事很實際。你在選 LLM 推論方案時，不能只看壓縮比。還要看硬體、延遲、吞吐量、準確率，還有實作成本。\u003C\u002Fp>\u003Cp>像 \u003Ca href=\"https:\u002F\u002Fvllm.ai\u002F\" target=\"_blank\" rel=\"noopener\">vLLM\u003C\u002Fa> 這種推論框架，會把方法放進真正的服務路徑。這時候，理論上的小差異，會變成機房裡的電費差異。\u003C\u002Fp>\u003Cp>所以我會說，TurboQuant 的價值不一定在原創。它比較像一個案例。提醒大家：論文名字很會唬人，但資料和 benchmark 不會說謊。\u003C\u002Fp>\u003Ch2>接下來該怎麼看這類論文\u003C\u002Fh2>\u003Cp>如果你在評估 KV-cache 壓縮，先回頭看舊論文。先看 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.02170\" target=\"_blank\" rel=\"noopener\">DRIVE\u003C\u002Fa>。再看 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.15421\" target=\"_blank\" rel=\"noopener\">EDEN\u003C\u002Fa>。你會更容易看出 TurboQuant 到底改了什麼。\u003C\u002Fp>\u003Cp>接著，把比較條件對齊。相同 GPU。相同 bit width。相同 accuracy target。相同 context length。少一項，結果都可能變味。\u003C\u002Fp>\u003Cp>最後，別只看壓縮比。要一起看 latency、throughput、顯存占用，還有實作複雜度。講白了，能進 production 的方法，才算真的有用。\u003C\u002Fp>\u003Cp>我的預測很直接。這類爭議只會越來越多。因為 LLM 基礎設施越來越成熟，大家開始更在意 citation、benchmark 和 reproducibility。你下次看到一個很猛的數字時，先問一句：這是新東西，還是舊東西換包裝？\u003C\u002Fp>","TurboQuant 主打 KV-cache 6x 壓縮，卻被指和 DRIVE、EDEN 同源，還有 scale 選擇與 benchmark 公平性爭議。","news.ycombinator.com","https:\u002F\u002Fnews.ycombinator.com\u002Fitem?id=47916890",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777467063814-l8dk.png","research","zh","d7b529f2-02b7-4d5b-bf82-490aa5fe8362",[17,18,19,20,21,22,23,24],"TurboQuant","EDEN","DRIVE","KV-cache","量化","LLM推論","benchmark","citation爭議",3,"2026-04-29T12:50:45.096442+00:00","2026-04-29T12:50:44.936+00:00",{"tags":29,"relatedLang":40,"relatedPosts":44},[30,32,35,36,38],{"name":18,"slug":31},"eden",{"name":33,"slug":34},"KV cache","kv-cache",{"name":21,"slug":21},{"name":17,"slug":37},"turboquant",{"name":19,"slug":39},"drive",{"id":15,"slug":41,"title":42,"language":43},"turboquant-eden-citation-fight-en","TurboQuant, EDEN, and the citation fight","en",[45,51,57,63,69,75],{"id":46,"slug":47,"title":48,"cover_image":49,"image_url":49,"created_at":50,"category":13},"33c9a55c-a8c0-4367-b742-f4567d1e98e3","mathematicians-warn-ai-could-distort-math-zh","數學界警告 AI 會扭曲證明標準","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780504386035-080l.png","2026-06-03T16:32:29.415063+00:00",{"id":52,"slug":53,"title":54,"cover_image":55,"image_url":55,"created_at":56,"category":13},"5c3cb90f-7efd-426f-8c09-32a303f82be9","humanoid-gpt-zero-shot-motion-tracking-zh","Humanoid-GPT：用 GPT 擴大動作追蹤","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780469319284-znpc.png","2026-06-03T06:47:34.463464+00:00",{"id":58,"slug":59,"title":60,"cover_image":61,"image_url":61,"created_at":62,"category":13},"e3a4b0f7-03b3-43c6-ae51-906b337c5c2f","ipt-vlms-hidden-space-reasoning-zh","IPT 讓 VLM 更會想像隱藏空間","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780468394735-1k40.png","2026-06-03T06:32:46.560029+00:00",{"id":64,"slug":65,"title":66,"cover_image":67,"image_url":67,"created_at":68,"category":13},"5fca9fe5-af66-47ce-85f0-0ffe1bee30b9","neuron-selectivity-changes-with-scale-zh","神經元選擇性會隨規模改變","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780467514422-7oss.png","2026-06-03T06:17:44.126547+00:00",{"id":70,"slug":71,"title":72,"cover_image":73,"image_url":73,"created_at":74,"category":13},"9f9c2a61-d058-4c62-bb88-106e683657f0","nasa-landsat-wild-disturbances-rising-zh","NASA Landsat：野火與風暴變多","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780448581102-owp0.png","2026-06-03T01:02:37.513233+00:00",{"id":76,"slug":77,"title":78,"cover_image":79,"image_url":79,"created_at":80,"category":13},"3479bdee-21fb-4fda-9572-9394caba01b0","adacodec-predictive-visual-code-video-mllms-zh","AdaCodec 用預測碼壓縮影片 token","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780381988591-z2sp.png","2026-06-02T06:32:28.249023+00:00",[82,87,92,97,102,107,112,117,122,127],{"id":83,"slug":84,"title":85,"created_at":86},"f18dbadb-8c59-4723-84a4-6ad22746c77a","deepmind-bets-on-continuous-learning-ai-2026-zh","DeepMind 押注 2026 連續學習 AI","2026-03-26T08:16:02.367355+00:00",{"id":88,"slug":89,"title":90,"created_at":91},"f4a106cb-02a6-4508-8f39-9720a0a93cee","ml-papers-of-the-week-github-research-desk-zh","每週 ML 論文清單，為何紅到 GitHub","2026-03-27T01:11:39.284175+00:00",{"id":93,"slug":94,"title":95,"created_at":96},"c4f807ca-4e5f-47f1-a48c-961cf3fc44dc","ai-ml-conferences-to-watch-in-2026-zh","2026 AI 研討會投稿時程整理","2026-03-27T01:51:53.874432+00:00",{"id":98,"slug":99,"title":100,"created_at":101},"cf046742-efb2-4753-aef9-caed5da5e32e","adaptive-block-scaled-data-types-zh","IF4：神經網路量化的聰明選擇","2026-03-31T06:00:36.990273+00:00",{"id":103,"slug":104,"title":105,"created_at":106},"53a0dc54-0371-4e40-8d5e-74e94a73840c","geometry-aware-similarity-metrics-for-neural-representations-zh","超越距離測量：用微分幾何重新理解神經網路","2026-03-31T06:01:01.241968+00:00",{"id":108,"slug":109,"title":110,"created_at":111},"fee7d472-a775-4b1d-bbc2-1e8bca1bbf8b","on-the-fly-repulsion-in-the-contextual-space-for-rich-divers-zh","讓AI繪圖更有創意：用排斥力提升生成多樣性","2026-03-31T06:01:25.439673+00:00",{"id":113,"slug":114,"title":115,"created_at":116},"a9901203-d69b-447b-8854-15d14eab32b4","vision-aided-beam-prediction-cnn-eca-zh","影像輔助波束預測升級 CNN","2026-04-01T10:00:25.8073+00:00",{"id":118,"slug":119,"title":120,"created_at":121},"b55e7dd4-0a24-4b3d-804d-b0309a03f498","triple-band-fss-mimo-antenna-sub-6-ghz-zh","三頻 FSS MIMO 天線瞄準 sub-6 GHz","2026-04-01T13:18:36.857305+00:00",{"id":123,"slug":124,"title":125,"created_at":126},"f68290bd-e7f3-4b30-ba22-dcd4e0130a66","openclaw-1299-repos-eight-weeks-analysis-zh","OpenClaw 1299 個 Repo 的資料解讀","2026-04-02T05:03:45.208411+00:00",{"id":128,"slug":129,"title":130,"created_at":131},"ed9f80eb-eb02-4d35-8ad4-0ddf428751dd","beam-coherence-aware-combining-mmwave-mimo-zh","毫米波 MIMO 的雙階合併法","2026-04-02T05:27:26.897188+00:00"]