[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-vllm":3},{"tag":4,"articles":11,"peer_article_count":115},{"id":5,"name":6,"slug":7,"article_count":8,"description_zh":9,"description_en":10},"6acb2d1f-934e-4e31-a9d1-8e4392fb099a","vLLM","vllm",6,"vLLM 是面向大型語言模型的高吞吐推理引擎，重點在 PagedAttention、KV cache 管理與連續批次處理，讓 GPU 更有效率地服務聊天、RAG、批次生成與多模型部署。","vLLM is a high-throughput inference engine for large language models, built around PagedAttention, KV cache management, and continuous batching. It matters for chat services, RAG pipelines, batch generation, and multi-model GPU deployment.",[12,21,28,36,44,51,58,66,73,80,87,94,101,108],{"id":13,"slug":14,"title":15,"summary":16,"category":17,"image_url":18,"cover_image":18,"language":19,"created_at":20},"5521addb-874b-44fe-a38d-32f4299010d2","open-source-ai-projects-developers-2026-en","7 open-source AI projects developers need in 2026","Seven open-source AI projects are replacing paid APIs, from local inference to browser agents, and they’re already pulling huge GitHub numbers.","tools","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782593283378-u4f3.png","en","2026-06-27T20:47:36.97629+00:00",{"id":22,"slug":23,"title":24,"summary":25,"category":17,"image_url":26,"cover_image":26,"language":19,"created_at":27},"6b6d7ea7-7e46-49ca-9e01-ce4e55eab086","vllm-sglang-vmlx-local-llm-runtimes-en","vLLM, SGLang, vMLX: better local LLM runtimes","Ollama and llama.cpp are the easy starts, but vLLM, SGLang, vMLX, MLC-LLM, and ExLlamaV3 fit serious local AI workflows.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782397979639-yxxb.png","2026-06-25T14:32:28.375358+00:00",{"id":29,"slug":30,"title":31,"summary":32,"category":33,"image_url":34,"cover_image":34,"language":19,"created_at":35},"9fd702bc-6c80-4d27-8f85-5971f898bef3","ultraquant-4bit-kv-caching-agents-en","UltraQuant: 4-bit KV caching for long agents","UltraQuant shows 4-bit KV caching can speed long, multi-turn agent serving while keeping more context resident.","research","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782331384598-tjhi.png","2026-06-24T20:02:33.028079+00:00",{"id":37,"slug":38,"title":39,"summary":40,"category":41,"image_url":42,"cover_image":42,"language":19,"created_at":43},"2e597d87-bf04-421c-8cb6-bb024bfca2cf","llama-cpp-vs-vllm-choosing-the-right-local-llm-engine-en","llama.cpp vs vLLM: Choosing the right local LLM engine","llama.cpp and vLLM are local LLM inference engines for different hardware and traffic patterns.","industry","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782087479497-fvfw.png","2026-06-22T00:17:31.700814+00:00",{"id":45,"slug":46,"title":47,"summary":48,"category":17,"image_url":49,"cover_image":49,"language":19,"created_at":50},"77c071b4-4373-449e-b812-2577d9644514","deploy-minimax-m3-with-vllm-openai-api-en","Deploy MiniMax M3 with vLLM OpenAI API","Run MiniMax M3 locally with vLLM and expose an OpenAI-compatible API.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781954275829-y5gk.png","2026-06-20T11:17:30.525369+00:00",{"id":52,"slug":53,"title":54,"summary":55,"category":41,"image_url":56,"cover_image":56,"language":19,"created_at":57},"44a50d6d-bec8-4b1e-a4f8-afab437292c8","red-hat-ai-mavenir-telco-ai-stack-en","Red Hat AI turns telco AI into a stack","Mavenir and Red Hat show how telcos can package AI with MLOps, vLLM inference, and AgentOps on Kubernetes.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781885892078-r3ek.png","2026-06-19T16:17:38.760812+00:00",{"id":59,"slug":60,"title":61,"summary":62,"category":63,"image_url":64,"cover_image":64,"language":19,"created_at":65},"ccc46975-50d1-4ece-8fd3-c082bf4858ae","self-host-minimax-m3-gpu-cloud-en","Self-host MiniMax M3 on GPU cloud","MiniMax M3 brings 229.9B MoE weights, 1M context, and multimodal output, but it needs serious GPU memory to run.","model-release","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781716680837-ikof.png","2026-06-17T17:17:35.800599+00:00",{"id":67,"slug":68,"title":69,"summary":70,"category":17,"image_url":71,"cover_image":71,"language":19,"created_at":72},"8f7dbc25-a9a2-4539-a4d1-8cd9932444e1","open-source-ai-software-infrastructure-wins-en","Open-source AI software is winning on infrastructure, not hype","Open-source AI software is winning because it now powers the core infrastructure for building, serving, and shipping models.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781691474026-aqqd.png","2026-06-17T10:17:27.28173+00:00",{"id":74,"slug":75,"title":76,"summary":77,"category":41,"image_url":78,"cover_image":78,"language":19,"created_at":79},"093f7c46-be7c-4b62-be00-73808a61e0a0","turboquant-amd-gpus-kv-cache-latency-en","TurboQuant on AMD GPUs cuts KV-cache latency","TurboQuant on AMD GPUs improves long-context LLM serving with up to 3.6x speedup and far lower KV-cache pressure.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781299067778-3pzd.png","2026-06-12T21:17:26.07+00:00",{"id":81,"slug":82,"title":83,"summary":84,"category":17,"image_url":85,"cover_image":85,"language":19,"created_at":86},"49dbda12-d94e-4e41-99d0-200d57eb97a9","turboquant-vllm-kv-cache-3bit-storage-en","TurboQuant turns vLLM KV cache into 3-bit storage","I break down TurboQuant’s vLLM cache compression and give you a copy-ready setup for 3-bit KV cache and fallback paths.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779286502445-214g.png","2026-05-20T14:14:37.831446+00:00",{"id":88,"slug":89,"title":90,"summary":91,"category":63,"image_url":92,"cover_image":92,"language":19,"created_at":93},"3e183760-52e4-4491-8ea9-be02ccac1042","minimax-m2-open-source-agentic-coding-en","MiniMax M2 opens up cheap agentic coding","MiniMax open-sourced M2, a model for agents and code that costs $0.30 per million input tokens and is free for a limited time.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779077044155-be13.png","2026-05-18T04:03:38.169023+00:00",{"id":95,"slug":96,"title":97,"summary":98,"category":33,"image_url":99,"cover_image":99,"language":19,"created_at":100},"670a7f69-911f-41e8-a18b-7d3491253a19","turboquant-vllm-comparison-fp8-kv-cache-en","TurboQuant vs FP8: vLLM’s first broad test","vLLM found FP8 KV-cache quantization beats TurboQuant on speed, while TurboQuant’s strongest variants hurt accuracy.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778839858405-b5ao.png","2026-05-15T10:10:37.219158+00:00",{"id":102,"slug":103,"title":104,"summary":105,"category":17,"image_url":106,"cover_image":106,"language":19,"created_at":107},"6dcd6852-b95a-4f62-853a-cc7eb32fff1a","gemma-4-assistant-models-faster-draft-tokens-en","Gemma 4 assistant models get faster draft tokens","Gemma 4 E2B and E4B assistant models use centroid masking to cut lm_head work about 45x with little quality loss.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778278254841-r19z.png","2026-05-08T22:10:34.02358+00:00",{"id":109,"slug":110,"title":111,"summary":112,"category":17,"image_url":113,"cover_image":113,"language":19,"created_at":114},"00a0853d-92b0-45e5-bfcd-97d7f77ec8a0","awesome-open-source-ai-projects-list-en","Awesome Open Source AI: the best projects list","This GitHub list curates battle-tested open-source AI tools, models, and infra, from PyTorch to vLLM, with 2,486 stars.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775999039728-fc7m.png","2026-04-12T13:03:36.707391+00:00",16]