[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-gemma-4-12b-specs-benchmarks-run-locally-zh":3,"article-related-gemma-4-12b-specs-benchmarks-run-locally-zh":31,"series-model-release-5507f140-5223-4f68-ade6-30d9e5457638":84},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":23,"views":27,"created_at":28,"published_at":29,"topic_cluster_id":30},"5507f140-5223-4f68-ade6-30d9e5457638","gemma-4-12b-specs-benchmarks-run-locally-zh","怎麼做 Gemma 4 12B 本地部署","\u003Cp data-speakable=\"summary\">這篇教你確認 Gemma 4 12B 的硬體需求、看懂公開\u003Ca href=\"\u002Fnews\u002Fllama-benchy-api-benchmark-zh\">基準\u003C\u002Fa>，並在\u003Ca href=\"\u002Fnews\u002Fhow-to-run-gemma-4-locally-unsloth-zh\">本機跑\u003C\u002Fa>起多模態\u003Ca href=\"\u002Fnews\u002Fllm-leaderboard-2026-300-models-ranked-zh\">模型\u003C\u002Fa>。\u003C\u002Fp>\u003Cp>這篇給想把 Gemma 4 12B 直接部署到筆電或桌機的開發者看，重點是先判斷硬體能不能跑，再把模型接進自己的應用。\u003C\u002Fp>\u003Cp>照著做完，你會拿到一套可用的本地推理環境、一份可對照的效能讀法，還有一個能處理文字、圖片、音訊與影片輸入的實作路線。\u003C\u002Fp>\u003Ch2>開始之前\u003C\u002Fh2>\u003Cul>\u003Cli>Google 帳號，用來查看模型文件與存取說明。\u003C\u002Fli>\u003Cli>已安裝 \u003Ca href=\"https:\u002F\u002Follama.com\u002Fdocs\">Ollama 文件\u003C\u002Fa> 或 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fllama.cpp\">llama.cpp GitHub repo\u003C\u002Fa> 對應的執行環境。\u003C\u002Fli>\u003Cli>Node 20+ 或 Python 3.11+，用來串接你的應用程式。\u003C\u002Fli>\u003Cli>至少 16 GB RAM 或 16 GB VRAM，才適合做實際本地推理。\u003C\u002Fli>\u003Cli>Apple Silicon Mac 且有 16 GB unified memory，如果你要走 MLX 路線。\u003C\u002Fli>\u003Cli>已下載一份 Gemma 4 12B 的量化 GGUF 或 MLX 版本。\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Step 1: 確認硬體配置\u003C\u002Fh2>\u003Cp>這一步的產出是「部署路線表」，因為先決定硬體是否符合 16 GB 等級，後面才不會白忙一場。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780777971165-4bit.png\" alt=\"怎麼做 Gemma 4 12B 本地部署\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>先確認你手上是 16 GB VRAM 的 GPU、16 GB unified memory 的 Mac，或足夠的系統 RAM。若不確定，先把目標設成 Q4 量化，這是本地跑 12B 模型最實際的起點。\u003C\u002Fp>\u003Cp>你應該能明確說出自己要走 Ollama、llama.cpp，或 MLX 其中一條路，而不是只知道「想試試看」。\u003C\u002Fp>\u003Ch2>Step 2: 安裝本地推理引擎\u003C\u002Fh2>\u003Cp>這一步的產出是「可啟動的推理引擎」，因為模型只有在 runner 能載入後才真的能用。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780777976567-e574.png\" alt=\"怎麼做 Gemma 4 12B 本地部署\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>依你的工作流選一個 runtime。想要最省事的 CLI，就用 Ollama；想要更高控制度，就用 llama.cpp；\u003Ca href=\"\u002Ftag\u002Fapple\">Apple\u003C\u002Fa> Silicon 則優先考慮 MLX。\u003C\u002Fp>\u003Cpre>\u003Ccode># Ollama 範例\nollama pull gemma4:12b\nollama run gemma4:12b\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>你應該看到模型成功載入，並在終端機或介面中回傳一段短回答。\u003C\u002Fp>\u003Ch2>Step 3: 載入量化模型檔\u003C\u002Fh2>\u003Cp>這一步的產出是「可在你機器上跑的模型檔」，因為 12B 版本要靠量化才能落在本地可用的記憶體範圍內。\u003C\u002Fp>\u003Cp>如果你用 llama.cpp，就下載 GGUF 的 Q4 類量化；如果你用 LM Studio，就在模型瀏覽器選同級量化；如果你用 MLX，就選對應 Apple Silicon 與記憶體預算的版本。\u003C\u002Fp>\u003Cp>你應該看到模型能順利啟動，不會頻繁 swap，也不會在第一個 prompt 就卡死或中斷。\u003C\u002Fp>\u003Ch2>Step 4: 驗證多模態輸入\u003C\u002Fh2>\u003Cp>這一步的產出是「多模態驗收結果」，因為它能證明模型不只會回文字，還能處理圖片、音訊與影片。\u003C\u002Fp>\u003Cp>若你的 runtime 支援，就各送一次圖片、短音訊與短影片。Gemma 4 12B 是 encoder-free 架構，所以同一條 decoder 路徑可以處理這些輸入型態。\u003C\u002Fp>\u003Cp>你應該看到的是 caption、逐字稿或摘要，而且內容要真的對應你上傳的媒體，而不是泛用的文字回覆。\u003C\u002Fp>\u003Ch2>Step 5: 量測本機速度\u003C\u002Fh2>\u003Cp>這一步的產出是「本機吞吐數字」，因為實際速度比發表時的說法更能決定你要不要上線。\u003C\u002Fp>\u003Cp>先跑一段短文字 prompt，記下 tokens per second，再用你的目標 context length 重跑一次。社群測試曾回報 RTX 4060 經 llama.cpp 約 21 tokens\u002Fs，MacBook Pro 透過 MLX 也有順暢表現。\u003C\u002Fp>\u003Cp>你也可以對照官方模型卡與自己的實測，因為 \u003Ca href=\"\u002Ftag\u002Fgoogle\">Google\u003C\u002Fa> 的說法是 12B 在標準基準上接近 26B MoE，但記憶體占用不到一半。\u003C\u002Fp>\u003Cp>你應該看到穩定的 \u003Ca href=\"\u002Ftag\u002Ftoken\">token\u003C\u002Fa> 生成速度，且在你的工作負載下不會因量化或 context 變長而完全失速。\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>指標\u003C\u002Fth>\u003Cth>基準／優化前\u003C\u002Fth>\u003Cth>結果／優化後\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>記憶體占用\u003C\u002Ftd>\u003Ctd>Gemma 3 27B 級本地執行\u003C\u002Ftd>\u003Ctd>Gemma 4 12B 少於一半記憶體占用\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>基準表現\u003C\u002Ftd>\u003Ctd>較早期的 Gemma 3 27B\u003C\u002Ftd>\u003Ctd>公開說法顯示 Gemma 4 12B 在多項測試上更好\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>社群速度\u003C\u002Ftd>\u003Ctd>一般桌機本地推理\u003C\u002Ftd>\u003Ctd>RTX 4060 透過 llama.cpp 約 21 tokens\u002Fs\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>Step 6: 接上你的應用程式\u003C\u002Fh2>\u003Cp>這一步的產出是「可用的本地應用」，例如文件摘要器、私有助理，或內部工具。\u003C\u002Fp>\u003Cp>如果你用 Ollama，就把應用指向 localhost:11434 的 \u003Ca href=\"\u002Ftag\u002Fopenai\">OpenAI\u003C\u002Fa> 相容端點；如果你用 llama.cpp 或 MLX，就用你偏好的 SDK 包一層本地 server 或 binding，然後加上自己的 prompt template。\u003C\u002Fp>\u003Cpre>\u003Ccode>POST http:\u002F\u002Flocalhost:11434\u002Fv1\u002Fchat\u002Fcompletions\n{\n  \"model\": \"gemma4:12b\",\n  \"messages\": [\n    {\"role\": \"user\", \"content\": \"Summarize this invoice and list due dates.\"}\n  ]\n}\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>你應該看到應用程式直接透過本地模型回應，而且資料不需要送到雲端 API。\u003C\u002Fp>\u003Ch2>常見錯誤\u003C\u002Fh2>\u003Cul>\u003Cli>在 16 GB 機器上直接跑全精度。修法：改用 Q4 量化，或縮小 context window。\u003C\u002Fli>\u003Cli>把所有 benchmark 數字都當成官方保證。修法：沒有模型卡明確寫出的數字，就只引用相對比較。\u003C\u002Fli>\u003Cli>用純文字 wrapper 去接多模態輸入。修法：改用支援圖片、音訊或影片 ingestion 的 runtime。\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>接下來可以看什麼\u003C\u002Fh2>\u003Cp>本地跑通之後，下一步可以做私有多模態工作流，然後拿 Gemma 4 12B 跟 \u003Ca href=\"\u002Ftag\u002Fqwen\">Qwen\u003C\u002Fa> 或其他開源權重模型，在你的真實任務上做對照。\u003C\u002Fp>","這篇教你確認 Gemma 4 12B 的硬體需求、看懂公開基準，並在本機跑起多模態模型。","www.buildfastwithai.com","https:\u002F\u002Fwww.buildfastwithai.com\u002Fblogs\u002Fgemma-4-12b-guide",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780777971165-4bit.png","model-release","zh","0e767e9d-5d17-4cd0-b6ee-0328f89eb49b",[17,18,19,20,21,22],"Gemma 4 12B","Ollama","llama.cpp","MLX","GGUF","多模態",[24,25,26],"先確認 16 GB 等級硬體，再選 Ollama、llama.cpp 或 MLX 路線。","用 Q4 類量化把 12B 模型壓進本機可用的記憶體範圍。","完成文字、圖片、音訊與影片驗證後，再把模型接進自己的應用。",0,"2026-06-06T20:32:24.857611+00:00","2026-06-06T20:32:24.837+00:00","0ccb5d2e-69f1-4354-a3e0-cb370221cd95",{"tags":32,"relatedLang":43,"relatedPosts":47},[33,35,37,39,41],{"name":21,"slug":34},"gguf",{"name":17,"slug":36},"gemma-4-12b",{"name":18,"slug":38},"ollama",{"name":19,"slug":40},"llamacpp",{"name":20,"slug":42},"mlx",{"id":15,"slug":44,"title":45,"language":46},"gemma-4-12b-specs-benchmarks-run-locally-en","Gemma 4 12B: Specs, Benchmarks & How to Run It Locally","en",[48,54,60,66,72,78],{"id":49,"slug":50,"title":51,"cover_image":52,"image_url":52,"created_at":53,"category":13},"ef42a437-8b06-4ff5-a135-ece7662c01f4","best-kimi-models-2026-k2-5-vs-k2-thinking-zh","2026 最佳 Kimi 模型：K2.5 對 K2 Thinking","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780770790333-x3lk.png","2026-06-06T18:32:39.410186+00:00",{"id":55,"slug":56,"title":57,"cover_image":58,"image_url":58,"created_at":59,"category":13},"fd2ad557-5c09-4758-964d-cda1c3c87a4c","kimi-k2-6-open-source-coding-agent-swarm-zh","Kimi K2.6 開源加上 Agent Swarm","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780761795960-0zg9.png","2026-06-06T16:02:21.702099+00:00",{"id":61,"slug":62,"title":63,"cover_image":64,"image_url":64,"created_at":65,"category":13},"8102ddec-e015-4294-9940-bf65553ae70d","minimax-m3-triple-capability-open-model-zh","MiniMax M3：開源三合一模型","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780756383081-hr0b.png","2026-06-06T14:32:35.396612+00:00",{"id":67,"slug":68,"title":69,"cover_image":70,"image_url":70,"created_at":71,"category":13},"409fc126-8ed2-42e3-bec3-9d114c4aca23","why-minimax-m3-matters-long-context-model-zh","為什麼 MiniMax M3 比又一個長上下文模型更重要","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780755468369-c0ia.png","2026-06-06T14:17:20.522361+00:00",{"id":73,"slug":74,"title":75,"cover_image":76,"image_url":76,"created_at":77,"category":13},"c92651ec-b626-49a2-bceb-230763733e3c","minimax-m3-engineer-workflow-agent-zh","MiniMax M3 讓工程師工作流更像代理","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780754606789-1jpm.png","2026-06-06T14:02:54.658299+00:00",{"id":79,"slug":80,"title":81,"cover_image":82,"image_url":82,"created_at":83,"category":13},"29e59d4e-6ccc-422b-afdb-18290e6fe168","best-open-source-llms-2026-zh","2026 最強開源 LLM 清單","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780731186899-0vr7.png","2026-06-06T07:32:37.635885+00:00",[85,90,95,100,105,110,115,120,125,130],{"id":86,"slug":87,"title":88,"created_at":89},"58b64033-7eb6-49b9-9aab-01cf8ae1b2f2","nvidia-rubin-six-chips-one-ai-supercomputer-zh","NVIDIA Rubin 把六顆晶片塞進 AI 機櫃","2026-03-26T07:18:45.861277+00:00",{"id":91,"slug":92,"title":93,"created_at":94},"0dcc2c61-c2a6-480d-adb8-dd225fc68914","march-2026-ai-model-news-what-mattered-zh","2026 年 3 月 AI 模型新聞重點","2026-03-26T07:32:08.386348+00:00",{"id":96,"slug":97,"title":98,"created_at":99},"214ab08b-5ce5-4b5c-8b72-47619d8675dd","why-small-models-are-winning-on-device-ai-zh","小模型為何吃下裝置端 AI","2026-03-26T07:36:30.488966+00:00",{"id":101,"slug":102,"title":103,"created_at":104},"785624b2-0355-4b82-adc3-de5e45eecd88","midjourney-v8-faster-images-higher-costs-zh","Midjourney V8 變快了，也變貴了","2026-03-26T07:52:03.562971+00:00",{"id":106,"slug":107,"title":108,"created_at":109},"cda76b92-d209-4134-86c1-a60f5bc7b128","xiaomi-mimo-trio-agents-robots-voice-zh","小米 MiMo 三模型瞄準代理、機器人與語音","2026-03-28T03:05:08.779489+00:00",{"id":111,"slug":112,"title":113,"created_at":114},"9e1044b4-946d-47fe-9e2a-c2ee032e1164","xiaomi-mimo-v2-pro-1t-moe-agents-zh","小米 MiMo-V2-Pro 登場：1T MoE 模型","2026-03-28T03:06:19.002353+00:00",{"id":116,"slug":117,"title":118,"created_at":119},"c4b6186f-bd84-4598-997e-c6e31d543c0d","cursor-composer-2-agentic-coding-model-zh","Cursor Composer 2 走向代理式寫碼","2026-03-28T03:13:06.422716+00:00",{"id":121,"slug":122,"title":123,"created_at":124},"e112e76f-ec3b-408f-810e-e93ae21a888a","apple-siri-gemini-distilled-models-zh","Apple Siri 牽手 Gemini 的真相","2026-03-29T04:52:57.886544+00:00",{"id":126,"slug":127,"title":128,"created_at":129},"c679b51f-194a-463b-87fc-7695256ff752","mimo-v2-pro-vs-omni-vs-flash-2026-zh","MiMo V2 Pro、Omni、Flash 怎麼選","2026-04-02T01:18:43.576128+00:00",{"id":131,"slug":132,"title":133,"created_at":134},"3b988fd7-6749-4f01-ba25-c0ad7486dc31","z-ai-glm-5v-turbo-design2code-claude-zh","GLM-5V-Turbo 在 Design2Code 贏了…","2026-04-02T04:03:36.31741+00:00"]