Mistral Small 2603: 256K context for $0.15 in
Mistral Small 2603 pairs a 256K context window with $0.15 input pricing, $0.60 output pricing, and strong reasoning scores.

Mistral Small 2603 combines a 256K context window with low token pricing and solid benchmark scores.
Requesty now lists Mistral AI SAS mistral-small-2603 with a 256K-token context window, $0.15 per million input tokens, and $0.60 per million output tokens. That puts it in a useful middle ground for teams that want long prompts, vision input, and structured output without paying top-tier model prices.
| Metric | Value | Why it matters |
|---|---|---|
| Input price | $0.15 / 1M tokens | Cheap enough for high-volume workloads |
| Output price | $0.60 / 1M tokens | Output is the expensive part to watch |
| Context window | 256K tokens | Fits long docs, logs, and multi-step prompts |
| MMLU Pro | 49.1% | General knowledge and reasoning signal |
| GPQA Diamond | 34.9% | Hard science questions remain a stress test |
| SciCode | 11.8% | Code-heavy science tasks are still challenging |
What Mistral Small 2603 is built to do
Get the latest AI news in your inbox
Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.
No spam. Unsubscribe at any time.
Mistral describes the model as a hybrid system that unifies instruction following, reasoning, and coding in one package. It also supports vision, tool calling, and JSON schema output, which makes it a practical fit for product teams building assistants, document processors, and internal agents.

The model card on Requesty lists 119B parameters with 6.5B active, which tells you something important about the architecture: the model is large on paper, but only a small slice is active during inference. That design usually matters when teams care about cost and latency as much as raw capability.
Requesty also labels the model as a chat API with no training on customer data, a 30-day retention policy, and an EU provider location. For teams under tighter compliance review, those details matter just as much as benchmark scores.
- Model ID:
mistral/mistral-small-2603 - Added: Jun 29, 2026
- Context window: 256K tokens
- Max output: N/A
- Data retention: Yes, 30 days
- Used for training: No
The benchmarks tell a practical story
The headline numbers are decent rather than flashy. 49.1% on MMLU Pro suggests the model can handle broad knowledge tasks reasonably well, while 34.9% on GPQA Diamond shows it still struggles on graduate-level science questions that are meant to be hard to game.
For coding-adjacent work, the 11.8% SciCode score is the number to read carefully. It suggests that the model may be better as a general-purpose assistant with coding features than as a specialized scientific coding engine.
“Benchmarks are useful, but they are not the product,” said OpenAI in its evals guidance, which is a fair reminder that synthetic scores rarely capture the messy reality of real user prompts.
Requesty says the benchmark data comes from official model cards, Artificial Analysis, and public leaderboards. That combination is helpful, but it also means you should treat these numbers as a starting point, not a final verdict.
If your workload is close to retrieval, summarization, structured extraction, or tool-using assistants, the model’s mix of long context and moderate pricing may matter more than a single benchmark line. If your workload is math-heavy or research-heavy, the lower science scores are a warning sign.
- MMLU Pro: 49.1%
- GPQA Diamond: 34.9%
- SciCode: 11.8%
- Artificial Analysis Intelligence Index: 3.6%
- Released: 2023-12-11
Pricing is where this model gets interesting
At $0.15 per million input tokens and $0.60 per million output tokens, mistral-small-2603 is priced for usage, not prestige. The model becomes especially attractive when your prompts are long but your outputs are short, which is common in extraction, classification, and agent routing.

Requesty publishes estimated costs that make the math easier to sanity-check. A 100K input + 10K output request is listed at $0.0210, while a 1M input + 100K output workload lands at $0.21. At larger scale, a 10M input + 1M output pattern comes out to $2.10.
That pricing becomes even more interesting because Requesty says it charges exactly what the upstream provider charges, with no markup and no per-request fees. The company also says prompt caching and smart routing can reduce effective cost by 30% to 80%, depending on the workload.
- 100K input + 10K output: $0.0210
- 1M input + 100K output: $0.21
- 10M input + 1M output: $2.10
- Potential savings from caching and routing: 30% to 80%
That is the real tradeoff here. The raw token price is already low, but the economics improve further if your app repeats prompts, reuses context, or routes requests intelligently.
How it fits into an OpenAI-compatible stack
Requesty makes the integration story simple: point your OpenAI SDK at https://router.requesty.ai/v1, swap in your Requesty API key, and set the model name to mistral/mistral-small-2603. The example in the docs works with Python, JavaScript, and cURL, which lowers the friction for teams that already speak OpenAI-style APIs.
This matters because adoption is often less about model quality and more about migration cost. If a team can test a new provider without rewriting the app, the decision usually comes down to latency, output quality, and token economics rather than integration pain.
Requesty also points to more than 400 models behind one API key, which makes the platform more of a routing layer than a single-model shop. For teams comparing gateway options, that kind of abstraction can simplify experimentation.
Here is the short version: mistral-small-2603 looks like a practical model for long-context apps that need decent reasoning, structured outputs, and predictable pricing. It does not look like the best choice for frontier science or math benchmarks, and the benchmark table makes that clear. If your app is a document-heavy assistant, a support bot, or a tool-using workflow with lots of context, this model is worth a test run now rather than later.
// Related Articles
- [MODEL]
Doubao Seed 2.1 Pro 不是追赶者,而是 Agent 时代的均衡强者
- [MODEL]
ACE-Step 1.5 makes local music generation a real product, not a demo
- [MODEL]
Sora’s 30-seat electric aircraft clears VTOL tests
- [MODEL]
OpenAI自研芯片不是秀肌肉,而是对英伟达的真实威胁
- [MODEL]
K3s v1.34.9 lands with Kubernetes 1.34.9
- [MODEL]
Kimi 2.7 makes price the real coding benchmark