Gemini 3.5 Flash Pricing, Context, Benchmarks
Google’s Gemini 3.5 Flash costs $1.50 per million input tokens and $9 output, with a 1,048,576-token context window.

Google’s Gemini 3.5 Flash pairs a 1M-token context window with low API pricing.
Gemini 3.5 Flash is Google’s latest Flash-tier model on OpenRouter, and the headline is simple: it pushes a lot more context through the pipe without asking for Pro-level money. The model launched on May 19, 2026, and OpenRouter lists it at $1.50 per million input tokens and $9 per million output tokens.
That matters because this is the kind of model developers will actually price into products. A 1,048,576-token context window changes how teams think about long documents, codebases, transcripts, and multi-step agent loops, especially when the output side is still priced low enough for production use.
| Metric | Value |
|---|---|
| Input price | $1.50 per million tokens |
| Output price | $9 per million tokens |
| Context window | 1,048,576 tokens |
| Weekly tokens | 525B |
| Release date | May 19, 2026 |
What Google is positioning here
Get the latest AI news in your inbox
Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.
No spam. Unsubscribe at any time.
Google is pitching Gemini 3.5 Flash as a high-efficiency multimodal model with near-Pro coding and reasoning quality, but at Flash-tier cost and speed. That is a very specific claim, and the product page backs it up with a focus on coding proficiency and parallel agentic execution loops.

The model accepts text, image, video, audio, and PDF inputs. In practice, that means a developer can keep one model in the loop for a code review pass, a screenshot analysis, a meeting recording, or a document-heavy workflow without switching between separate systems.
Google also says the default thinking effort is medium, with minimal, low, medium, and high options available. That gives teams a real cost-control knob. If you are building a support assistant or a document parser, medium might be enough. If you are running a harder reasoning task, you can spend more only when the task deserves it.
- Multimodal inputs: text, image, video, audio, PDF
- Default thinking effort: medium
- Thinking levels: minimal, low, medium, high
- Use case focus: coding and parallel agent loops
Why the pricing matters
At $1.50 per million input tokens, Gemini 3.5 Flash is cheap enough for long-context workloads that would have felt wasteful on older pricing tiers. The output price is higher at $9 per million tokens, which is normal for a model that can produce longer, more useful answers after chewing through a huge prompt.
That pricing split tells you where Google wants developers to be careful. Feed the model a lot of context, but keep your outputs tight when you can. For agentic systems, that could mean summarizing intermediate steps instead of dumping full traces back to the user.
“The right model is the one that gives you the best quality at the lowest cost.” — Sundar Pichai, Google I/O 2024 keynote
That quote fits this launch well. Google is not trying to sell Gemini 3.5 Flash as the most expensive or most dramatic model in the lineup. It is trying to make the economics work for products that need scale, speed, and decent reasoning in the same package.
How it compares with other model tiers
OpenRouter’s listing makes the tradeoff easy to see. Gemini 3.5 Flash sits in the same family as higher-end models, but it is priced for throughput, not prestige. The 1M-token context window is especially notable because it puts the model in a class where long-running workflows become practical instead of experimental.

For teams comparing model choices, the real question is no longer whether a model can read a large codebase. The question is whether the model can do it often enough, cheaply enough, and with enough quality to justify replacing a pile of brittle tooling. Gemini 3.5 Flash is clearly aimed at that middle ground.
- Gemini 3.5 Pro is the more premium sibling for heavier reasoning tasks
- OpenRouter models lets teams compare providers and pricing in one place
- 1M-token context means fewer chunking hacks for docs and code
- Weekly token capacity listed by OpenRouter: 525B
That 525B weekly token figure is also worth noting. It suggests OpenRouter expects serious demand, not just curiosity traffic from benchmark watchers. If a model is built for agent loops and long-context work, volume follows quickly once teams wire it into real products.
What developers should watch next
The useful part of this launch is not the marketing language, it is the combination of price, context, and multimodal support. Those three facts tell you where Gemini 3.5 Flash fits: long prompts, repeated calls, and workflows where a model has to inspect more than plain text.
If you are building on top of OpenRouter’s API, this is the sort of model that could replace several narrower steps in a pipeline. One model can inspect a PDF, read a screenshot, reason about a bug report, and draft a response without forcing your app to stitch together multiple specialized services.
The catch is that long context does not automatically mean better outcomes. Teams still need to test whether the model actually uses that extra room well, especially on code tasks where accuracy matters more than recall. The next thing to watch is benchmark data from real developer workflows, not just vendor claims.
For now, Gemini 3.5 Flash looks designed for a very practical audience: teams that want strong enough reasoning to ship features, but do not want to pay Pro-model rates every time a user pastes a giant document or a messy repo snapshot. If Google’s quality claims hold up under real usage, expect this model to become a default choice for high-volume agent and document products.
// Related Articles
- [MODEL]
Gemma 4 12B: Specs, Benchmarks & How to Run It Locally
- [MODEL]
Best Kimi Models in 2026: K2.5 vs K2 Thinking
- [MODEL]
Kimi K2.6 adds open-source coding and agent swarm
- [MODEL]
MiniMax M3: 中国首个三合一开源模型
- [MODEL]
Why MiniMax M3 matters more than another long-context model
- [MODEL]
MiniMax M3 让工程师工作流更像代理