Xiaomi MiMo-V2.5-Pro: pricing, benchmarks, and limits
Xiaomi’s MiMo-V2.5-Pro pairs a 1M-token context with strong coding, agentic, and reasoning scores at mid-range pricing.

Xiaomi’s MiMo-V2.5-Pro is a text-only flagship model with strong coding, agentic, and long-context performance.
Xiaomi released MiMo-V2.5-Pro on April 22, 2026, and the numbers make it easy to see why people are paying attention. It ships with a 1,048,576-token context window, 131,072 max completion tokens, and pricing that lands at $0.435 per million input tokens and $0.87 per million output tokens.
That combination puts it in a very specific class of model: one built for long documents, agent workflows, and code-heavy tasks rather than image or video work. The model is available through providers including Xiaomi, Novita, DigitalOcean, and DeepInfra.
| Metric | Value | Why it matters |
|---|---|---|
| Release date | April 22, 2026 | Shows this is a current flagship release |
| Context window | 1,048,576 tokens | Useful for long documents and multi-file work |
| Input price | $0.435 per 1M tokens | Mid-range cost for heavy usage |
| Output price | $0.87 per 1M tokens | Competitive for long responses |
| Intelligence index | 42.2 | Signals broad reasoning quality |
| Coding index | 60.2 | Points to strong software work |
| Agentic index | 68.7 | Suggests solid tool use and autonomy |
What Xiaomi is actually selling here
Get the latest AI news in your inbox
Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.
No spam. Unsubscribe at any time.
MiMo-V2.5-Pro is Xiaomi’s top-tier text model in this family, and it is tuned for the kind of work that makes models feel useful in production: coding, tool use, function calling, and long-horizon reasoning. The model is not trying to be a multimodal Swiss army knife. It is text-only, which narrows the use case but also makes the positioning clearer.

That clarity matters. A lot of model launches blur the line between “can do everything” and “does any one thing well.” Xiaomi took the opposite route here. The public data points to a model that is meant to sit inside agent pipelines, software engineering assistants, and document analysis systems where context length and instruction following matter more than flashy demos.
On Artificial Analysis, MiMo-V2.5-Pro is described with an intelligence index of 42.2, a coding index of 60.2, and an agentic index of 68.7. Those are the numbers you care about if you are deciding whether it belongs in a production stack.
- Text-only modality, so no native vision support in this variant
- 1M-token context for large codebases and long documents
- Function calling and tool use for agent workflows
- Mid-range pricing compared with other professional models
Benchmarks show strength in the right places
The benchmark picture is more interesting than a single headline score. Xiaomi’s model does well on scientific reasoning, instruction following, and agentic terminal work, which lines up with the product pitch. On the public benchmark sheet, it posts 86.6% on GPQA Diamond, 94.2% on τ²-Bench, 79.9% on IFBench, and 73.3% on LCR.
Those are not vanity metrics. GPQA Diamond tests graduate-level science questions, τ²-Bench measures conversational agent behavior, IFBench looks at instruction following, and LCR checks long-context reliability. Put together, they suggest a model that can hold state across large inputs and stay on task when the prompt gets messy.
“The model is very good at following instructions and using tools, which makes it suitable for long, document-heavy workflows.”
That line from the Artificial Analysis model page captures the practical upside better than any marketing copy could. If you are building an internal assistant that reads tickets, edits code, and calls tools, these are the traits that matter.
- GPQA Diamond: 86.6%
- τ²-Bench: 94.2%
- IFBench: 79.9%
- LCR: 73.3%
How it compares on cost and capability
Pricing is where MiMo-V2.5-Pro becomes easier to place. At $0.435 per million input tokens and $0.87 per million output tokens, it lands in the same general bracket as DeepSeek V4 Pro, which the source material says is the closest pricing match. Xiaomi also gets a -4 point regional accessibility adjustment in the provider profile, which is worth noting if you care about deployment friction.

The comparison set matters because model buyers rarely shop in a vacuum. The article points to MiMo-V2-Pro, MiMo-V2.5, and Kimi K2.6 as nearby alternatives. In other words, Xiaomi is not asking buyers to treat this as a lone outlier. It is part of a crowded band of professional models that compete on context, coding, and agent behavior.
Here is the practical comparison that jumps out:
- MiMo-V2.5-Pro: 1M context, $0.435 input, $0.87 output
- DeepSeek V4 Pro: similar price band, useful as a direct benchmark rival
- MiMo-V2-Pro: lower-tier sibling for teams that do not need the flagship profile
- Kimi K2.6: another nearby option in the same capability range
If your workload is mostly short prompts, this model is overkill. If your workload is large repositories, agent loops, or multi-document reasoning, the extra context headroom is the reason to care.
Who should test it first
MiMo-V2.5-Pro makes the most sense for teams that already know where model latency, context, and tool use affect product quality. A software team could use it for code review helpers, repo search, and issue triage. An operations team could use it for document digestion, ticket routing, and multi-step workflows. A research team could use it for long-context reading and structured extraction.
The live performance data also gives a better sense of deployment tradeoffs. On the source page, Xiaomi lists 99% average uptime, 423ms best latency, 49 tok/s throughput, and 4/4 active endpoints. Those numbers are not perfect, but they are concrete, and they suggest the model is already being served in a real production setup rather than a lab-only demo.
That said, the absence of vision support is a hard boundary. If your product needs image understanding, screen analysis, or multimodal agents, this version is the wrong fit. Xiaomi’s pitch here is narrower, and that makes the evaluation easier: it is a text model for serious text work.
For readers comparing model families, our related coverage of Anthropic’s Claude Fable 5 is a useful contrast because it shows how different vendors are splitting capability across general reasoning, coding, and deployment access.
The bottom line for buyers
MiMo-V2.5-Pro looks like a model built for teams that care more about long context and agent behavior than flashy multimodal demos. The combination of a 1M-token window, strong coding scores, and mid-range pricing gives Xiaomi a credible seat at the table.
The key question is whether your workload actually needs that much context. If it does, this model deserves a pilot. If it does not, you are probably paying for capacity you will never use. My read: the most interesting test will be whether Xiaomi can turn these benchmark wins into real developer adoption over the next few quarters.
// Related Articles
- [MODEL]
OpenAI’s Sora hardware targets enterprise video
- [MODEL]
GPT-5.6 rumors point to 2M context and coding gains
- [MODEL]
Kimi’s long-context push keeps getting bigger
- [MODEL]
Midjourney Medical’s 60-Second Body Scan Claim
- [MODEL]
GLM-5.2开源:1M上下文冲刺长程任务
- [MODEL]
Apple pushes AI deeper into iPhone apps