Mistral Small 2603: 256K context for $0.15 in

Q: The benchmarks tell a practical story?

The headline numbers are decent rather than flashy. 49.1% on MMLU Pro suggests the model can handle broad knowledge tasks reasonably well, while 34.9% on GPQA Diamond shows it still struggles on graduate-level science questions that are meant to be hard to game.

Q: Pricing is where this model gets interesting?

At $0.15 per million input tokens and $0.60 per million output tokens, mistral-small-2603 is priced for usage, not prestige. The model becomes especially attractive when your prompts are long but your outputs are short, which is common in extraction, classification, and agent routing.

OraCore Editors

Back to home

[MODEL] July 3, 20266 min readOraCore Editors

Mistral Small 2603: 256K context for $0.15 in

Mistral Small 2603 pairs a 256K context window with $0.15 input pricing, $0.60 output pricing, and strong reasoning scores.

Mistral AI

Share LinkedIn

Mistral Small 2603: 256K context for $0.15 in

Mistral Small 2603 combines a 256K context window with low token pricing and solid benchmark scores.

Requesty now lists Mistral AI SAS mistral-small-2603 with a 256K-token context window, $0.15 per million input tokens, and $0.60 per million output tokens. That puts it in a useful middle ground for teams that want long prompts, vision input, and structured output without paying top-tier model prices.

Metric	Value	Why it matters
Input price	$0.15 / 1M tokens	Cheap enough for high-volume workloads
Output price	$0.60 / 1M tokens	Output is the expensive part to watch
Context window	256K tokens	Fits long docs, logs, and multi-step prompts
MMLU Pro	49.1%	General knowledge and reasoning signal
GPQA Diamond	34.9%	Hard science questions remain a stress test
SciCode	11.8%	Code-heavy science tasks are still challenging

What Mistral Small 2603 is built to do

Get the latest AI news in your inbox

Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.

No spam. Unsubscribe at any time.

Mistral describes the model as a hybrid system that unifies instruction following, reasoning, and coding in one package. It also supports vision, tool calling, and JSON schema output, which makes it a practical fit for product teams building assistants, document processors, and internal agents.

The model card on Requesty lists 119B parameters with 6.5B active, which tells you something important about the architecture: the model is large on paper, but only a small slice is active during inference. That design usually matters when teams care about cost and latency as much as raw capability.

Requesty also labels the model as a chat API with no training on customer data, a 30-day retention policy, and an EU provider location. For teams under tighter compliance review, those details matter just as much as benchmark scores.

Model ID: mistral/mistral-small-2603
Added: Jun 29, 2026
Context window: 256K tokens
Max output: N/A
Data retention: Yes, 30 days
Used for training: No

The benchmarks tell a practical story

The headline numbers are decent rather than flashy. 49.1% on MMLU Pro suggests the model can handle broad knowledge tasks reasonably well, while 34.9% on GPQA Diamond shows it still struggles on graduate-level science questions that are meant to be hard to game.

For coding-adjacent work, the 11.8% SciCode score is the number to read carefully. It suggests that the model may be better as a general-purpose assistant with coding features than as a specialized scientific coding engine.

“Benchmarks are useful, but they are not the product,” said OpenAI in its evals guidance, which is a fair reminder that synthetic scores rarely capture the messy reality of real user prompts.

Requesty says the benchmark data comes from official model cards, Artificial Analysis, and public leaderboards. That combination is helpful, but it also means you should treat these numbers as a starting point, not a final verdict.

If your workload is close to retrieval, summarization, structured extraction, or tool-using assistants, the model’s mix of long context and moderate pricing may matter more than a single benchmark line. If your workload is math-heavy or research-heavy, the lower science scores are a warning sign.

MMLU Pro: 49.1%
GPQA Diamond: 34.9%
SciCode: 11.8%
Artificial Analysis Intelligence Index: 3.6%
Released: 2023-12-11

Pricing is where this model gets interesting

At $0.15 per million input tokens and $0.60 per million output tokens, mistral-small-2603 is priced for usage, not prestige. The model becomes especially attractive when your prompts are long but your outputs are short, which is common in extraction, classification, and agent routing.

Requesty publishes estimated costs that make the math easier to sanity-check. A 100K input + 10K output request is listed at $0.0210, while a 1M input + 100K output workload lands at $0.21. At larger scale, a 10M input + 1M output pattern comes out to $2.10.

That pricing becomes even more interesting because Requesty says it charges exactly what the upstream provider charges, with no markup and no per-request fees. The company also says prompt caching and smart routing can reduce effective cost by 30% to 80%, depending on the workload.

100K input + 10K output: $0.0210
1M input + 100K output: $0.21
10M input + 1M output: $2.10
Potential savings from caching and routing: 30% to 80%

That is the real tradeoff here. The raw token price is already low, but the economics improve further if your app repeats prompts, reuses context, or routes requests intelligently.

How it fits into an OpenAI-compatible stack

Requesty makes the integration story simple: point your OpenAI SDK at https://router.requesty.ai/v1, swap in your Requesty API key, and set the model name to mistral/mistral-small-2603. The example in the docs works with Python, JavaScript, and cURL, which lowers the friction for teams that already speak OpenAI-style APIs.

This matters because adoption is often less about model quality and more about migration cost. If a team can test a new provider without rewriting the app, the decision usually comes down to latency, output quality, and token economics rather than integration pain.

Requesty also points to more than 400 models behind one API key, which makes the platform more of a routing layer than a single-model shop. For teams comparing gateway options, that kind of abstraction can simplify experimentation.

Here is the short version: mistral-small-2603 looks like a practical model for long-context apps that need decent reasoning, structured outputs, and predictable pricing. It does not look like the best choice for frontier science or math benchmarks, and the benchmark table makes that clear. If your app is a document-heavy assistant, a support bot, or a tool-using workflow with lots of context, this model is worth a test run now rather than later.

// Related Articles

Mistral Small 2603: 256K context for $0.15 in

What Mistral Small 2603 is built to do

Get the latest AI news in your inbox

The benchmarks tell a practical story

Pricing is where this model gets interesting

How it fits into an OpenAI-compatible stack

Doubao Seed 2.1 Pro 不是追赶者，而是 Agent 时代的均衡强者

ACE-Step 1.5 makes local music generation a real product, not a demo

Sora’s 30-seat electric aircraft clears VTOL tests

OpenAI自研芯片不是秀肌肉，而是对英伟达的真实威胁

K3s v1.34.9 lands with Kubernetes 1.34.9

Kimi 2.7 makes price the real coding benchmark