[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-gemma-4-256k-context-open-models-en":3,"article-related-gemma-4-256k-context-open-models-en":30,"series-model-release-17a7dc8b-25e4-4993-b0dd-b23733390007":81},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":22,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":29},"17a7dc8b-25e4-4993-b0dd-b23733390007","gemma-4-256k-context-open-models-en","Gemma 4 brings 256K context to open models","\u003Cp data-speakable=\"summary\">Google’s Gemma 4 adds multimodal input, 256K context, and five open-weight model sizes.\u003C\u002Fp>\u003Cp>\u003Ca href=\"\u002Ftag\u002Fgoogle-deepmind\">Google DeepMind\u003C\u002Fa> has updated \u003Ca href=\"https:\u002F\u002Fai.google.dev\u002Fgemma\" target=\"_blank\" rel=\"noopener\">Gemma\u003C\u002Fa> with a fourth-generation model family that can read text, images, and, in some sizes, audio. The headline number is the context window: up to 256,000 tokens, which puts long-document work and multi-turn agent tasks in a much more practical range.\u003C\u002Fp>\u003Cp>The release is split across five sizes, from E2B and E4B for on-device and edge deployments up to 12B, 26B A4B, and 31B for heavier workloads. Google also says the models ship as open weights in both pre-trained and instruction-tuned forms, under an \u003Ca href=\"https:\u002F\u002Fopensource.org\u002Flicense\u002Fapache-2-0\" target=\"_blank\" rel=\"noopener\">Apache 2.0\u003C\u002Fa> license.\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>Model\u003C\u002Fth>\u003Cth>Params\u003C\u002Fth>\u003Cth>Context\u003C\u002Fth>\u003Cth>Modalities\u003C\u002Fth>\u003Cth>Notes\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>E2B\u003C\u002Ftd>\u003Ctd>2.3B effective, 5.1B with embeddings\u003C\u002Ftd>\u003Ctd>128K\u003C\u002Ftd>\u003Ctd>Text, image, audio\u003C\u002Ftd>\u003Ctd>Designed for efficient on-device use\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>E4B\u003C\u002Ftd>\u003Ctd>4.5B effective, 8B with embeddings\u003C\u002Ftd>\u003Ctd>128K\u003C\u002Ftd>\u003Ctd>Text, image, audio\u003C\u002Ftd>\u003Ctd>Small model with audio support\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>12B Unified\u003C\u002Ftd>\u003Ctd>11.95B\u003C\u002Ftd>\u003Ctd>256K\u003C\u002Ftd>\u003Ctd>Text, image, audio\u003C\u002Ftd>\u003Ctd>Decoder-only unified design\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>26B A4B MoE\u003C\u002Ftd>\u003Ctd>25.2B total, 3.8B active\u003C\u002Ftd>\u003Ctd>256K\u003C\u002Ftd>\u003Ctd>Text, image\u003C\u002Ftd>\u003Ctd>Mixture-of-experts model\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>31B\u003C\u002Ftd>\u003Ctd>30.7B\u003C\u002Ftd>\u003Ctd>256K\u003C\u002Ftd>\u003Ctd>Text, image\u003C\u002Ftd>\u003Ctd>Largest dense model in the family\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>Gemma 4 is built for long context and mixed inputs\u003C\u002Fh2>\u003Cp>Gemma 4 is not a single model with one deployment target. It is a family, and the split matters. The smaller E2B and E4B models are aimed at devices that need speed and lower memory use, while the 12B, 26B A4B, and 31B models are meant for stronger GPUs, workstations, and server-side \u003Ca href=\"\u002Ftag\u002Finference\">inference\u003C\u002Fa>.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781686085345-8dgk.png\" alt=\"Gemma 4 brings 256K context to open models\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>That spread makes Gemma 4 more useful than a one-size-fits-all release. A mobile assistant, a desktop coding tool, and a document analysis service do not need the same tradeoff between latency, memory, and quality. Google is trying to cover all three with the same model line.\u003C\u002Fp>\u003Cp>One practical detail is the context window. The smaller models use 128K tokens, and the mid and larger models go to 256K. That is enough room for long reports, large codebases, or many-turn conversations without chopping the input into tiny pieces.\u003C\u002Fp>\u003Cul>\u003Cli>E2B and E4B support text, image, and audio.\u003C\u002Fli>\u003Cli>12B Unified supports text, image, and audio without separate encoders.\u003C\u002Fli>\u003Cli>26B A4B and 31B focus on text and image, with 256K context.\u003C\u002Fli>\u003Cli>All five sizes are open-weight releases.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>The architecture choices are doing real work\u003C\u002Fh2>\u003Cp>Google’s documentation says Gemma 4 uses a mix of dense and mixture-of-experts designs, plus a hybrid attention scheme that alternates local sliding-window attention with global attention. That is the kind of engineering detail that usually decides whether a model feels fast in practice or just looks impressive on a chart.\u003C\u002Fp>\u003Cp>The 26B A4B model is the clearest example. It has 25.2B total parameters, but only 3.8B active parameters during inference. That means the model can behave more like a smaller system at runtime while still keeping the capacity of a much larger one.\u003C\u002Fp>\u003Cp>The smaller models also use per-layer embeddings, which Google says improve parameter efficiency for on-device deployment. In plain English: the model family is trying to save memory where it matters most, without stripping out the features developers actually want.\u003C\u002Fp>\u003Cblockquote>\u003Cp>“The future of AI is open,” said Demis Hassabis, co-founder and CEO of Google DeepMind, in a 2024 blog post announcing Gemma.\u003C\u002Fp>\u003C\u002Fblockquote>\u003Cp>That line matters here because Gemma 4 keeps Google’s open-weight story alive while adding capabilities that used to be reserved for bigger proprietary systems. The company is clearly betting that developers want models they can inspect, tune, and ship in more places.\u003C\u002Fp>\u003Ch2>The benchmarks show strength, but the spread is the story\u003C\u002Fh2>\u003Cp>Google’s \u003Ca href=\"\u002Ftag\u002Fbenchmark\">benchmark\u003C\u002Fa> table is broad, and the numbers show a family with clear tiers. The 31B model leads most of the pack, but the 26B A4B model often gets close while using far fewer active parameters. That is exactly the kind of tradeoff teams care about when they are paying for inference.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781686086131-m5h6.png\" alt=\"Gemma 4 brings 256K context to open models\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Here are a few of the more telling results from the instruction-tuned models:\u003C\u002Fp>\u003Cul>\u003Cli>MMLU Pro: 85.2% for 31B, 82.6% for 26B A4B, 77.2% for 12B Unified.\u003C\u002Fli>\u003Cli>LiveCodeBench v6: 80.0% for 31B, 77.1% for 26B A4B, 72.0% for 12B Unified.\u003C\u002Fli>\u003Cli>Codeforces Elo: 2150 for 31B, 1718 for 26B A4B, 1659 for 12B Unified.\u003C\u002Fli>\u003Cli>MRCR v2 at 128K: 66.4% for 31B, 44.1% for 26B A4B, 43.4% for 12B Unified.\u003C\u002Fli>\u003C\u002Ful>\u003Cp>The coding and reasoning numbers are especially interesting. A 2150 Codeforces Elo is a serious result for a general-purpose model family, and the jump from 1659 to 1718 on the 26B A4B model suggests the MoE design is doing more than saving compute on paper.\u003C\u002Fp>\u003Cp>There is also a visible drop-off in long-context retrieval as you move down the stack. That is normal, but it is the number to watch if you plan to stuff entire docs, transcripts, or repos into the prompt.\u003C\u002Fp>\u003Ch2>What developers can actually build with it\u003C\u002Fh2>\u003Cp>Gemma 4 is aimed at more than chat. Google highlights text generation, programming, reasoning, function calling, and multimodal understanding. The model family also adds built-in support for the system role, which makes structured prompting and agent-style workflows easier to manage.\u003C\u002Fp>\u003Cp>That matters because a lot of model releases talk about “agents” without making the plumbing any better. Here, the combination of function calling, \u003Ca href=\"\u002Ftag\u002Flong-context\">long context\u003C\u002Fa>, and system prompt support gives teams a cleaner base for assistants that need to read, decide, and act.\u003C\u002Fp>\u003Cp>If you are comparing it with other open-weight options, the practical question is deployment fit. Smaller Gemma 4 models are a better match for local apps and edge devices, while the larger ones make more sense for server-side tools that need stronger reasoning and document handling. For teams already using \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002F\" target=\"_blank\" rel=\"noopener\">Hugging Face\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fwww.ollama.com\u002F\" target=\"_blank\" rel=\"noopener\">Ollama\u003C\u002Fa>, or \u003Ca href=\"https:\u002F\u002Flmstudio.ai\u002F\" target=\"_blank\" rel=\"noopener\">LM Studio\u003C\u002Fa>, the open-weight format lowers the friction of testing these models in real workflows.\u003C\u002Fp>\u003Cp>Google also points developers to its broader ecosystem, including \u003Ca href=\"https:\u002F\u002Fdevelopers.googleblog.com\u002F\" target=\"_blank\" rel=\"noopener\">Google Developers Blog\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fdevelopers.google.com\u002Fai\" target=\"_blank\" rel=\"noopener\">Google AI\u003C\u002Fa>, and \u003Ca href=\"https:\u002F\u002Fcloud.google.com\u002Fvertex-ai\" target=\"_blank\" rel=\"noopener\">Vertex AI\u003C\u002Fa>. That gives Gemma 4 a straightforward path from local experiments to managed deployment.\u003C\u002Fp>\u003Ch2>Gemma 4 is a practical release, not a novelty drop\u003C\u002Fh2>\u003Cp>The most interesting thing about Gemma 4 is that Google did not optimize for a single headline metric. It built a family with real deployment variety, long context, multimodal input, and enough benchmark strength to matter in production conversations.\u003C\u002Fp>\u003Cp>If you are building a document assistant, a coding \u003Ca href=\"\u002Ftag\u002Fcopilot\">copilot\u003C\u002Fa>, or a multimodal agent, the first question is no longer whether an open model can handle the workload. The question is which Gemma 4 size fits your latency budget, memory ceiling, and context length.\u003C\u002Fp>\u003Cp>My bet: the 26B A4B model will get the most attention from developers who want strong results without paying full dense-model costs, while the 12B Unified model will be the sleeper choice for teams that care about multimodal input and simpler architecture. The next thing to watch is whether third-party tooling catches up fast enough to make those choices easy to test outside Google’s own stack.\u003C\u002Fp>","Google’s Gemma 4 adds text, image, and audio input, plus up to 256K context and five model sizes for local or server use.","ai.google.dev","https:\u002F\u002Fai.google.dev\u002Fgemma\u002Fdocs\u002Fcore\u002Fmodel_card_4",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781686085345-8dgk.png","model-release","en","1d12108f-e96c-405e-b7fa-2c2527b2797a",[17,18,19,20,21],"Gemma 4","open-weight models","multimodal AI","long context","Google DeepMind",[23,24,25],"Gemma 4 adds text, image, and audio input, with up to 256K context in larger models.","The 26B A4B model uses 3.8B active parameters, which makes MoE efficiency a major part of the release.","Google is positioning Gemma 4 for on-device apps, agent workflows, and server-side deployments.",0,"2026-06-17T08:47:34.623499+00:00","2026-06-17T08:47:34.615+00:00","1bae1133-d241-4581-9332-fbf39690c319",{"tags":31,"relatedLang":40,"relatedPosts":44},[32,34,36,38],{"name":17,"slug":33},"gemma-4",{"name":19,"slug":35},"multimodal-ai",{"name":20,"slug":37},"long-context",{"name":21,"slug":39},"google-deepmind",{"id":15,"slug":41,"title":42,"language":43},"gemma-4-256k-context-open-models-zh","Gemma 4 把 256K 上下文帶進開放模型","zh",[45,51,57,63,69,75],{"id":46,"slug":47,"title":48,"cover_image":49,"image_url":49,"created_at":50,"category":13},"15982ebe-5f2f-44c0-ade7-6f47a149cb1e","kimi-k2-7-code-api-kimi-code-first-en","Kimi K2.7 Code 该优先上 API 和 Kimi Code，而不是等生态成熟","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781631170958-dj9h.png","2026-06-16T17:32:23.025066+00:00",{"id":52,"slug":53,"title":54,"cover_image":55,"image_url":55,"created_at":56,"category":13},"68ef0621-c9fa-4a58-9bdf-51c3f5ac6bce","kingdom-hearts-iv-confirmed-switch-2-launch-en","Kingdom Hearts IV confirmed for Switch 2 launch","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781615872914-vfne.png","2026-06-16T13:17:24.943789+00:00",{"id":58,"slug":59,"title":60,"cover_image":61,"image_url":61,"created_at":62,"category":13},"ad479036-974c-4ea3-8f42-b446afa9f600","gemini-3-5-live-translate-rolls-out-70-languages-en","Gemini 3.5 Live Translate rolls out in 70+ languages","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781489873452-jntq.png","2026-06-15T02:17:26.444846+00:00",{"id":64,"slug":65,"title":66,"cover_image":67,"image_url":67,"created_at":68,"category":13},"025d1488-1e51-4b77-8d7e-aadd35a65366","openai-5-6-model-significant-improvements-en","OpenAI’s 5.6 model hints at a bigger jump","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781460175729-4hg5.png","2026-06-14T18:02:30.197438+00:00",{"id":70,"slug":71,"title":72,"cover_image":73,"image_url":73,"created_at":74,"category":13},"9265411b-cd2a-4d84-ad56-591fe8f53beb","glm-52-open-frontier-ai-for-developers-en","GLM-5.2把前沿模型变成可用工具","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781442210978-n218.png","2026-06-14T13:03:03.519515+00:00",{"id":76,"slug":77,"title":78,"cover_image":79,"image_url":79,"created_at":80,"category":13},"af05bd77-6c80-4f89-937d-bc0d935b1c57","openai-files-ipo-paperwork-scrutiny-grows-en","OpenAI files IPO paperwork as scrutiny grows","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781427772574-pn1k.png","2026-06-14T09:02:25.184548+00:00",[82,87,92,97,102,107,112,117,122,127],{"id":83,"slug":84,"title":85,"created_at":86},"d4cffde7-9b50-4cc7-bb68-8bc9e3b15477","nvidia-rubin-ai-supercomputer-en","NVIDIA Unveils Rubin: A Leap in AI Supercomputing","2026-03-25T16:24:35.155565+00:00",{"id":88,"slug":89,"title":90,"created_at":91},"eab919b9-fbac-4048-89fc-afad6749ccef","google-gemini-ai-innovations-2026-en","Google's AI Leap with Gemini Innovations in 2026","2026-03-25T16:27:18.841838+00:00",{"id":93,"slug":94,"title":95,"created_at":96},"5f5cfc67-3384-4816-a8f6-19e44d90113d","gap-google-gemini-ai-checkout-en","Gap Teams Up with Google Gemini for AI-Driven Checkout","2026-03-25T16:27:46.483272+00:00",{"id":98,"slug":99,"title":100,"created_at":101},"f6d04567-47f6-49ec-804c-52e61ab91225","ai-model-release-wave-march-2026-en","Navigating the AI Model Release Wave of March 2026","2026-03-25T16:28:45.409716+00:00",{"id":103,"slug":104,"title":105,"created_at":106},"895c150c-569e-4fdf-939d-dade785c990e","small-language-models-transform-ai-en","Small Language Models: Llama 3.2 and Phi-3 Transform AI","2026-03-25T16:30:26.688313+00:00",{"id":108,"slug":109,"title":110,"created_at":111},"38eb1d26-d961-4fd3-ae12-9c4089680f5f","midjourney-v8-alpha-features-pricing-en","Midjourney V8 Alpha: A Deep Dive into Its Features and Pricing","2026-03-26T01:25:36.387587+00:00",{"id":113,"slug":114,"title":115,"created_at":116},"bf36bb9e-3444-4fb8-ab19-0df6bc9d8271","rag-2026-indispensable-ai-bridge-en","RAG in 2026: The Indispensable AI Bridge","2026-03-26T01:28:34.472046+00:00",{"id":118,"slug":119,"title":120,"created_at":121},"60881d6d-2310-44ef-b1fb-7f98e9dd2f0e","xiaomi-mimo-trio-agents-robots-voice-en","Xiaomi’s MiMo trio targets agents, robots, and voice","2026-03-28T03:05:08.899895+00:00",{"id":123,"slug":124,"title":125,"created_at":126},"f063d8d1-41d1-4de4-8ebc-6c40511b9369","xiaomi-mimo-v2-pro-1t-moe-agents-en","Xiaomi MiMo-V2-Pro: 1T MoE Model for Agents","2026-03-28T03:06:19.238032+00:00",{"id":128,"slug":129,"title":130,"created_at":131},"a1379e9a-6785-4ff5-9b0a-8cff55f8264f","cursor-composer-2-started-from-kimi-en","Cursor’s Composer 2 started from Kimi","2026-03-28T03:11:59.132398+00:00"]