Xiaomi MiMo pushes 1T model to 1000 tokens/s

OraCore Editors

[MODEL] June 11, 20267 min readOraCore Editors

Xiaomi MiMo pushes 1T model to 1000 tokens/s

Xiaomi’s MiMo-V2.5-Pro-UltraSpeed pairs a 1T model with up to 1000 tokens/s and new pricing before legacy models retire.

Share LinkedIn

Xiaomi MiMo pushes 1T model to 1000 tokens/s

Xiaomi’s MiMo-V2.5-Pro-UltraSpeed is a 1T model that reaches up to 1000 tokens/s.

Xiaomi’s MiMo API Open Platform now puts a trillion-parameter model on a speed tier that claims up to 1000 tokens per second, with a limited trial price and a hard migration date for older model names. The company says MiMo-V2-Pro and MiMo-V2-Pro Omni will auto-route to V2.5 on June 1, 2026, then fully deprecate on June 30.

Metric	MiMo-V2.5-Pro-UltraSpeed	MiMo-V2.5-Pro
Model size	1T parameters	Not listed on this page
Output speed	500 to 1000 tokens/s	50 to 100 tokens/s
Input cache hit price	¥0.075 / million tokens	¥0.025 / million tokens
Input cache miss price	¥9 / million tokens	¥3 / million tokens
Output price	¥18 / million tokens	¥6 / million tokens
Trial price, USD output	$2.61 / million tokens	$0.87 / million tokens

What Xiaomi is actually selling here

Get the latest AI news in your inbox

Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.

No spam. Unsubscribe at any time.

The headline number is speed, but the product is really a package deal: a huge model, streaming output, tool calling, and a pricing tier aimed at teams that care about response time more than raw token thrift. Xiaomi describes the mode as the “UltraSpeed experience mode” of MiMo-V2.5-Pro, and it is limited to approved users with daily capacity controls.

That matters because most model vendors still treat throughput as a tradeoff. Xiaomi is trying to sell the opposite story: keep the model large, keep the answers flowing, and push latency low enough for work that feels interactive rather than batch processed.

MiMo-V2.5-Pro-UltraSpeed is described as a 1T flagship model.
The claimed output speed range is 500 to 1000 tokens per second.
The page says access is limited and approved daily.
The model supports text input and text output.

Why the pricing change matters

Xiaomi’s own comparison table tells the story better than the marketing copy. The UltraSpeed tier costs more than standard MiMo-V2.5-Pro on every major line item, but the company frames that premium as a speed buy, not a capability buy.

For teams shipping products where latency is visible to users, that premium can make sense. A support assistant that answers in a blink feels different from one that pauses long enough to break the flow. A trading signal, a fraud check, or a code completion prompt also has a shelf life measured in seconds, sometimes milliseconds.

“When breaking news drops, the model analyzes market impact and generates trading signals within milliseconds — closing the decision loop before the market moves.”

That line comes from Xiaomi’s own “Recommended Scenarios” section, and it tells you exactly who this product is for: people building systems where delay is expensive. The company also points to real-time risk control, scientific research, and coding assistance as target use cases.

The technical trick behind the speed claim

Xiaomi says the speed jump comes from a mix of algorithm and system changes, not from custom silicon. That is a big claim, because the industry often assumes you need specialized hardware to move a model this large that fast.

The page names four pieces of the stack that matter: FP4 mixed-precision quantization, DFlash speculative decoding, TileRT system-level optimization, and heterogeneous pipeline collaboration. In plain English, Xiaomi is squeezing more work out of the GPU by shrinking some weights, predicting in blocks, keeping kernels resident longer, and splitting communication from compute more carefully.

FP4 quantization applies only to MoE experts while other parts keep original precision.
DFlash uses block-level masked parallel prediction instead of classic autoregressive drafting.
TileRT keeps the compute pipeline resident on the GPU.
The company says the design breaks through 1000 tokens/s “without requiring custom silicon.”

That last point is the one to watch. If Xiaomi can keep this experience stable under real traffic, it gives the company a clean pitch against vendors that talk about model quality while quietly accepting slower inference as the cost of doing business.

How it compares with the standard Pro tier

The comparison with MiMo-V2.5-Pro is stark. UltraSpeed is priced at ¥18 per million output tokens, while Pro is ¥6. On the input side, cache hits are ¥0.075 versus ¥0.025, and cache misses are ¥9 versus ¥3.

USD pricing shows the same pattern. UltraSpeed output is listed at $2.61 per million tokens, compared with $0.87 for Pro. The input cache miss price is $1.305 versus $0.435. If you are running a high-volume app, those differences add up quickly, so the decision is less about whether UltraSpeed is better and more about whether the latency gain pays for itself.

UltraSpeed output TPS: 500 to 1000.
Pro output TPS: 50 to 100.
UltraSpeed output cost in CNY: ¥18 per million tokens.
Pro output cost in CNY: ¥6 per million tokens.

That 5x to 10x speed gap is the real headline. Xiaomi is not selling a slightly faster model. It is selling a different operating mode for teams that want the answer to arrive before the user has time to notice the model thinking.

What developers should do before June 2026

The migration notice is easy to ignore until it becomes a production issue. Xiaomi says MiMo-V2-Pro and MiMo-V2-Pro Omni will auto-route to V2.5 on June 1, 2026 at 00:00 GMT+8, and the legacy names will be fully deprecated by June 30.

That gives developers a short runway to test pricing, throughput, and any prompt or tool-calling differences before the old endpoints disappear. If you are already using the platform, the sensible move is to pin a migration sprint now, not after the routing change is live.

For teams evaluating the model for the first time, the practical question is simple: do you need a large model that feels instant, or do you need the cheapest possible token bill? Xiaomi is clearly betting that a slice of the market will pay for speed, and the June 2026 cutoff means the company wants everyone on the newer family anyway.

The next interesting test is whether this 1T, 1000 tokens/s claim holds up outside Xiaomi’s demo flow. If it does, the company will have something rare in enterprise AI: a speed story that is tied to real pricing, a real migration deadline, and a model family developers can actually plan around.

// Related Articles

Xiaomi MiMo pushes 1T model to 1000 tokens/s

What Xiaomi is actually selling here

Get the latest AI news in your inbox

Why the pricing change matters

The technical trick behind the speed claim

How it compares with the standard Pro tier

What developers should do before June 2026

Google ships Gemini 3.6 Flash and 3.5 Lite

Kimi K3 Is Forcing Silicon Valley to Pick Sides

Opus 5 lets you ship with fewer refusals

Claude Opus 5 undercuts Fable 5 on price

OpenAI model catalog adds GPT-5.6 pricing tiers

Gemini 3.6 Flash proves Google is betting on efficiency over hype