[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-llamacpp":3},{"tag":4,"articles":11,"peer_article_count":122},{"id":5,"name":6,"slug":7,"article_count":8,"description_zh":9,"description_en":10},"d7a2807c-2270-4884-8b44-f0ffccfd73a8","llama.cpp","llamacpp",3,"llama.cpp 是把大型語言模型帶到本機與邊緣裝置的推論框架，重點在低記憶體占用、量化、KV cache 管理與啟動速度。相關議題常延伸到 GPU\u002FCPU 混合推論、Rust\u002FCUDA 整合，以及多模態與微調工具鏈的相容性。","llama.cpp is a local inference stack for running LLMs on CPUs, GPUs, and edge devices with tight memory budgets. The topic often covers quantization, KV cache optimization, cold-start latency, and how it fits into fine-tuning and multimodal workflows.",[12,21,28,36,43,50,58,65,72,79,86,93,100,107,115],{"id":13,"slug":14,"title":15,"summary":16,"category":17,"image_url":18,"cover_image":18,"language":19,"created_at":20},"493ea70d-fffd-4365-ba76-63069ada5744","atomicbot-llama-cpp-fork-throughput-gains-zh","AtomicBot 的 llama.cpp 分支，兩條路都加速","4 項改動看懂 AtomicBot 的 llama.cpp 分支：Gemma 4、Qwen 3.6、TurboQuant KV 與權重壓縮，最快可達 30-50% 吞吐提升。","industry","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782332275892-6iw2.png","zh","2026-06-24T20:17:28.725554+00:00",{"id":22,"slug":23,"title":24,"summary":25,"category":17,"image_url":26,"cover_image":26,"language":19,"created_at":27},"84609d0a-d6a7-4228-a5cc-e1170725e28e","llama-cpp-vs-vllm-benji-mo-xing-yin-qing-zen-me-xuan-zh","llama.cpp vs vLLM：本機模型引擎怎麼選","這篇比較 llama.cpp 和 vLLM，幫你判斷是要用 CPU 友善、適合單人本機推理的方案，還是適合多使用者、高併發服務的 GPU 推理引擎。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782087478586-22tr.png","2026-06-22T00:17:31.282164+00:00",{"id":29,"slug":30,"title":31,"summary":32,"category":33,"image_url":34,"cover_image":34,"language":19,"created_at":35},"83ab893d-aa71-481a-bf79-413e19f9cb41","run-minimax-m3-locally-unsloth-studio-zh","本機跑 MiniMax M3 的 Unsloth Studio 指南","這篇教你在自己的電腦上安裝 Unsloth Studio、下載 MiniMax M3 的 GGUF 量化檔，並成功開啟本機聊天介面。","tools","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781759883897-uij8.png","2026-06-18T05:17:34.558347+00:00",{"id":37,"slug":38,"title":39,"summary":40,"category":33,"image_url":41,"cover_image":41,"language":19,"created_at":42},"9e877017-90c5-4f62-961d-7a31ffb0ed98","llama-cpp-release-kernel-tuning-over-features-zh","llama.cpp 這次又贏了：靠 kernel 收緊，不靠功能堆疊","llama.cpp 的最新版本證明，kernel 修正與 backend 調校，比追逐新功能更能決定本地推理是否真的可用。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781648269418-o4jo.png","2026-06-16T22:17:23.2234+00:00",{"id":44,"slug":45,"title":46,"summary":47,"category":33,"image_url":48,"cover_image":48,"language":19,"created_at":49},"813c149e-04fb-42c9-a1d8-89ae2f46f66c","llamastash-terminal-native-llamacpp-launcher-zh","LlamaStash 把 llama.cpp 帶進終端機","LlamaStash 是一個 Rust 啟動器，把 llama.cpp 包成 TUI、CLI、daemon 和 OpenAI proxy。它主打單一二進位、低延遲、終端機優先，適合本機 LLM 工作流。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780993095245-e3v2.png","2026-06-09T08:17:43.665329+00:00",{"id":51,"slug":52,"title":53,"summary":54,"category":55,"image_url":56,"cover_image":56,"language":19,"created_at":57},"5507f140-5223-4f68-ade6-30d9e5457638","gemma-4-12b-specs-benchmarks-run-locally-zh","怎麼做 Gemma 4 12B 本地部署","這篇教你確認 Gemma 4 12B 的硬體需求、看懂公開基準，並在本機跑起多模態模型。","model-release","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780777971165-4bit.png","2026-06-06T20:32:24.857611+00:00",{"id":59,"slug":60,"title":61,"summary":62,"category":17,"image_url":63,"cover_image":63,"language":19,"created_at":64},"8041b1f8-e409-44dc-b574-210938430234","how-to-run-gemma-4-locally-unsloth-zh","怎麼在本機跑 Gemma 4","用 Unsloth Studio 或 llama.cpp 在本機下載、啟動並聊天 Gemma 4。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780777065583-koml.png","2026-06-06T20:17:21.259919+00:00",{"id":66,"slug":67,"title":68,"summary":69,"category":33,"image_url":70,"cover_image":70,"language":19,"created_at":71},"88902925-b601-4f55-98a6-7c1e020046b2","why-llama-cpp-release-notes-matter-more-than-bragging-zh","為什麼 llama.cpp 的 release notes 比模型吹噓更重要","llama.cpp 的最新版本證明，真正拉開速度差距的不是模型宣傳，而是後端正確性、載入器判斷與跨平台調度。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779769557026-v0kk.png","2026-05-26T04:25:23.26108+00:00",{"id":73,"slug":74,"title":75,"summary":76,"category":33,"image_url":77,"cover_image":77,"language":19,"created_at":78},"38356be5-0705-44e3-a2bb-36437b5e1276","openhuman-private-personal-ai-local-setup-zh","OpenHuman 讓私有 AI 變本機版","我拆 OpenHuman 的私有個人 AI 玩法，順手給你一份可直接貼進 README 的本機部署模板。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779625593073-b8to.png","2026-05-24T12:26:03.212125+00:00",{"id":80,"slug":81,"title":82,"summary":83,"category":33,"image_url":84,"cover_image":84,"language":19,"created_at":85},"a17f824d-9049-4f8b-934e-3dfd466123d3","why-llama-cpp-should-treat-turboquant-as-default-zh","為什麼 llama.cpp 應把 TurboQuant 當成新預設路徑","TurboQuant 應成為 llama.cpp 的新預設思路，因為非對稱 KV 壓縮能大幅省記憶體，且不破壞既有相容性。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779481554771-u2dd.png","2026-05-22T20:25:20.763766+00:00",{"id":87,"slug":88,"title":89,"summary":90,"category":33,"image_url":91,"cover_image":91,"language":19,"created_at":92},"e2412efc-9da1-4984-9875-4f2c18be8724","llama-cpp-local-llm-inference-cpp-zh","llama.cpp 把本地推理做進 C\u002FC++","llama.cpp 強調在 C\u002FC++ 中做本地 LLM 推理，支援多種硬體與 OpenAI 相容伺服器，適合離線、邊緣與隱私場景。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779480955447-0d7t.png","2026-05-22T20:15:27.912799+00:00",{"id":94,"slug":95,"title":96,"summary":97,"category":17,"image_url":98,"cover_image":98,"language":19,"created_at":99},"e62c3870-f6fe-45e1-8628-082b86195d31","5-kv-cache-takeaways-for-llamacpp-users-zh","5 個 llama.cpp 的 KV cache 重點","5 個重點帶你看懂 llama.cpp 的 KV cache 壓縮、記憶體節省與效能取捨，判斷該追新方法還是先用現有格式。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779285255441-f432.png","2026-05-20T13:53:42.308292+00:00",{"id":101,"slug":102,"title":103,"summary":104,"category":33,"image_url":105,"cover_image":105,"language":19,"created_at":106},"868034d7-415b-49bd-8f25-4dbd602e7094","unsloth-qwen35-partial-fine-tuning-zh","Unsloth 讓 Qwen3.5 可分層微調","Unsloth 新增 Qwen3.5 視覺模型分層微調，能只訓練 vision、language、attention 或 MLP。VRAM 更省，訓練也更快，對多模態團隊很實用。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775218014686-wj6q.png","2026-04-03T12:06:38.523525+00:00",{"id":108,"slug":109,"title":110,"summary":111,"category":112,"image_url":113,"cover_image":113,"language":19,"created_at":114},"fdb08bdf-a3bd-4c4d-acaf-ce8035f24449","turboquant-google-paper-explained-zh","TurboQuant 是什麼？Google 新論文重點","Google 的 TurboQuant 盯上 LLM 的 KV cache 瓶頸，用低位元量化降低記憶體用量與推論成本。這篇帶你看它在解什麼問題、和其他優化法差在哪。","research","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775160957331-6iua.png","2026-04-02T20:15:40.07166+00:00",{"id":116,"slug":117,"title":118,"summary":119,"category":33,"image_url":120,"cover_image":120,"language":19,"created_at":121},"d233c90c-e7d8-418d-a8dc-f76080f1b968","turboquant-fast-cold-starts-rust-gpu-zh","TurboQuant、冷啟動與 GPU Rust","TurboQuant 把 KV cache 壓到 4.6 倍，GPU state restore 盯上 32B 模型冷啟動，Rust 也更深入 CUDA 開發。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775146380823-5d5u.png","2026-04-02T16:12:38.23896+00:00",11]