[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-llamacpp":3},{"tag":4,"articles":11,"peer_article_count":93},{"id":5,"name":6,"slug":7,"article_count":8,"description_zh":9,"description_en":10},"d7a2807c-2270-4884-8b44-f0ffccfd73a8","llama.cpp","llamacpp",3,"llama.cpp 是把大型語言模型帶到本機與邊緣裝置的推論框架，重點在低記憶體占用、量化、KV cache 管理與啟動速度。相關議題常延伸到 GPU\u002FCPU 混合推論、Rust\u002FCUDA 整合，以及多模態與微調工具鏈的相容性。","llama.cpp is a local inference stack for running LLMs on CPUs, GPUs, and edge devices with tight memory budgets. The topic often covers quantization, KV cache optimization, cold-start latency, and how it fits into fine-tuning and multimodal workflows.",[12,21,28,36,43,50,57,65,72,79,86],{"id":13,"slug":14,"title":15,"summary":16,"category":17,"image_url":18,"cover_image":18,"language":19,"created_at":20},"cc87056f-b2e8-4ef0-966c-bf82ccffbb54","atomicbot-llama-cpp-fork-throughput-gains-en","AtomicBot’s llama.cpp fork boosts throughput on two fronts","4 ways AtomicBot’s llama.cpp fork speeds up Gemma 4 and Qwen 3.6, with matrix-bench gains up to 30-50% on the right setup.","industry","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782332277361-4xh4.png","en","2026-06-24T20:17:29.158539+00:00",{"id":22,"slug":23,"title":24,"summary":25,"category":17,"image_url":26,"cover_image":26,"language":19,"created_at":27},"2e597d87-bf04-421c-8cb6-bb024bfca2cf","llama-cpp-vs-vllm-choosing-the-right-local-llm-engine-en","llama.cpp vs vLLM: Choosing the right local LLM engine","llama.cpp and vLLM are local LLM inference engines for different hardware and traffic patterns.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782087479497-fvfw.png","2026-06-22T00:17:31.700814+00:00",{"id":29,"slug":30,"title":31,"summary":32,"category":33,"image_url":34,"cover_image":34,"language":19,"created_at":35},"796113f3-61af-4985-9d09-afefbd99d013","run-minimax-m3-locally-unsloth-studio-en","Run MiniMax M3 locally in Unsloth Studio","Set up Unsloth Studio to download and run MiniMax M3 on your own machine.","tools","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781759880801-p006.png","2026-06-18T05:17:34.96983+00:00",{"id":37,"slug":38,"title":39,"summary":40,"category":33,"image_url":41,"cover_image":41,"language":19,"created_at":42},"8f7dbc25-a9a2-4539-a4d1-8cd9932444e1","open-source-ai-software-infrastructure-wins-en","Open-source AI software is winning on infrastructure, not hype","Open-source AI software is winning because it now powers the core infrastructure for building, serving, and shipping models.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781691474026-aqqd.png","2026-06-17T10:17:27.28173+00:00",{"id":44,"slug":45,"title":46,"summary":47,"category":33,"image_url":48,"cover_image":48,"language":19,"created_at":49},"00938978-e7c5-4815-83bc-9abb1194e33f","llama-cpp-release-kernel-tuning-over-features-en","llama.cpp’s latest release proves the project still wins by tightenin…","llama.cpp’s latest release shows that careful kernel fixes and backend tuning matter more than flashy features.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781648264517-0qhb.png","2026-06-16T22:17:23.764635+00:00",{"id":51,"slug":52,"title":53,"summary":54,"category":33,"image_url":55,"cover_image":55,"language":19,"created_at":56},"1a47f4ce-884c-4c5f-b5e1-2b117799fcce","ollama-default-local-ai-layer-en","Ollama is becoming the default local AI layer","Ollama is no longer just a local model runner; it is turning into the default AI layer for apps and agents.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781310763401-cv9v.png","2026-06-13T00:32:16.812947+00:00",{"id":58,"slug":59,"title":60,"summary":61,"category":62,"image_url":63,"cover_image":63,"language":19,"created_at":64},"0e767e9d-5d17-4cd0-b6ee-0328f89eb49b","gemma-4-12b-specs-benchmarks-run-locally-en","Gemma 4 12B: Specs, Benchmarks & How to Run It Locally","Gemma 4 12B is a local-first multimodal model you can run on a 16 GB machine.","model-release","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780777984661-5ymr.png","2026-06-06T20:32:25.294996+00:00",{"id":66,"slug":67,"title":68,"summary":69,"category":33,"image_url":70,"cover_image":70,"language":19,"created_at":71},"a7daef63-2e7d-4942-8bc1-7ebbe31ebb52","why-llama-cpp-release-notes-matter-more-than-bragging-en","Why llama.cpp’s release notes matter more than its model bragging","llama.cpp’s latest releases show that backend correctness drives real speed gains.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779769553066-1mx4.png","2026-05-26T04:25:24.65574+00:00",{"id":73,"slug":74,"title":75,"summary":76,"category":33,"image_url":77,"cover_image":77,"language":19,"created_at":78},"8a164bd6-6f92-47a6-87fb-72a6371aae17","why-llama-cpp-should-treat-turboquant-as-default-en","Why llama.cpp should treat TurboQuant as the new default path","TurboQuant is the right direction for llama.cpp because asymmetric KV compression cuts memory without breaking compatibility.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779481556833-a9v3.png","2026-05-22T20:25:23.12744+00:00",{"id":80,"slug":81,"title":82,"summary":83,"category":33,"image_url":84,"cover_image":84,"language":19,"created_at":85},"5ed4267c-b54b-4c73-8192-79bfacaf438d","llama-cpp-local-llm-inference-cpp-en","llama.cpp adds local LLM inference in C\u002FC++","ggml-org’s llama.cpp keeps expanding local LLM support with OpenAI-compatible serving, browser WebGPU, and broad hardware backends.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779480952129-tpau.png","2026-05-22T20:15:28.848286+00:00",{"id":87,"slug":88,"title":89,"summary":90,"category":17,"image_url":91,"cover_image":91,"language":19,"created_at":92},"bfbd028b-4704-4de5-8f54-55625836952f","5-kv-cache-takeaways-for-llamacpp-users-en","5 KV cache takeaways for llama.cpp users","5 takeaways from TurboQuant: under-3-bit KV cache compression, memory savings, and the tradeoffs llama.cpp users should watch.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779285258553-domr.png","2026-05-20T13:53:43.522918+00:00",15]