[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-xiaomi-mimo-1t-model-1000-tokens-per-second-en":3,"article-related-xiaomi-mimo-1t-model-1000-tokens-per-second-en":30,"series-model-release-2c34e9fb-ebe7-46ca-996a-939d965159fd":83},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":22,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":29},"2c34e9fb-ebe7-46ca-996a-939d965159fd","xiaomi-mimo-1t-model-1000-tokens-per-second-en","Xiaomi MiMo pushes 1T model to 1000 tokens\u002Fs","\u003Cp data-speakable=\"summary\">Xiaomi’s MiMo-V2.5-Pro-UltraSpeed is a 1T model that reaches up to \u003Ca href=\"\u002Fnews\u002Fmimo-1000-tps-1t-model-ultraspeed-en\">1000 tokens\u003C\u002Fa>\u002Fs.\u003C\u002Fp>\u003Cp>\u003Ca href=\"https:\u002F\u002Fplatform.xiaomimimo.com\u002Fdocs\u002Fen-US\u002Fmodel-intro\u002Fmimo-v2.5-pro-ultraspeed\" target=\"_blank\" rel=\"noopener\">Xiaomi’s MiMo API Open Platform\u003C\u002Fa> now puts a trillion-parameter model on a speed tier that claims up to 1000 tokens per second, with a limited trial price and a hard migration date for older model names. The company says \u003Ca href=\"https:\u002F\u002Fplatform.xiaomimimo.com\u002Fdocs\u002Fen-US\u002Fmodel-intro\u002Fmimo-v2.5-pro-ultraspeed\" target=\"_blank\" rel=\"noopener\">MiMo-V2-Pro\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Fplatform.xiaomimimo.com\u002Fdocs\u002Fen-US\u002Fmodel-intro\u002Fmimo-v2.5-pro-ultraspeed\" target=\"_blank\" rel=\"noopener\">MiMo-V2-Pro Omni\u003C\u002Fa> will auto-route to V2.5 on June 1, 2026, then fully deprecate on June 30.\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>Metric\u003C\u002Fth>\u003Cth>MiMo-V2.5-Pro-UltraSpeed\u003C\u002Fth>\u003Cth>MiMo-V2.5-Pro\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>Model size\u003C\u002Ftd>\u003Ctd>1T parameters\u003C\u002Ftd>\u003Ctd>Not listed on this page\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Output speed\u003C\u002Ftd>\u003Ctd>500 to 1000 tokens\u002Fs\u003C\u002Ftd>\u003Ctd>50 to 100 tokens\u002Fs\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Input cache hit price\u003C\u002Ftd>\u003Ctd>¥0.075 \u002F million tokens\u003C\u002Ftd>\u003Ctd>¥0.025 \u002F million tokens\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Input cache miss price\u003C\u002Ftd>\u003Ctd>¥9 \u002F million tokens\u003C\u002Ftd>\u003Ctd>¥3 \u002F million tokens\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Output price\u003C\u002Ftd>\u003Ctd>¥18 \u002F million tokens\u003C\u002Ftd>\u003Ctd>¥6 \u002F million tokens\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Trial price, USD output\u003C\u002Ftd>\u003Ctd>$2.61 \u002F million tokens\u003C\u002Ftd>\u003Ctd>$0.87 \u002F million tokens\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>What Xiaomi is actually selling here\u003C\u002Fh2>\u003Cp>The headline number is speed, but the product is really a package deal: a huge model, streaming output, tool calling, and a pricing tier aimed at teams that care about response time more than raw \u003Ca href=\"\u002Ftag\u002Ftoken\">token\u003C\u002Fa> thrift. Xiaomi describes the mode as the “UltraSpeed experience mode” of \u003Ca href=\"https:\u002F\u002Fplatform.xiaomimimo.com\u002Fdocs\u002Fen-US\u002Fmodel-intro\u002Fmimo-v2.5-pro-ultraspeed\" target=\"_blank\" rel=\"noopener\">MiMo-V2.5-Pro\u003C\u002Fa>, and it is limited to approved users with daily capacity controls.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781129885712-1m6x.png\" alt=\"Xiaomi MiMo pushes 1T model to 1000 tokens\u002Fs\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>That matters because most model vendors still treat throughput as a tradeoff. Xiaomi is trying to sell the opposite story: keep the model large, keep the answers flowing, and push latency low enough for work that feels interactive rather than batch processed.\u003C\u002Fp>\u003Cul>\u003Cli>\u003Ca href=\"https:\u002F\u002Fplatform.xiaomimimo.com\u002Fdocs\u002Fen-US\u002Fmodel-intro\u002Fmimo-v2.5-pro-ultraspeed\" target=\"_blank\" rel=\"noopener\">MiMo-V2.5-Pro-UltraSpeed\u003C\u002Fa> is described as a 1T flagship model.\u003C\u002Fli>\u003Cli>The claimed output speed range is 500 to 1000 tokens per second.\u003C\u002Fli>\u003Cli>The page says access is limited and approved daily.\u003C\u002Fli>\u003Cli>The model supports text input and text output.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Why the pricing change matters\u003C\u002Fh2>\u003Cp>Xiaomi’s own comparison table tells the story better than the marketing copy. The UltraSpeed tier costs more than standard \u003Ca href=\"https:\u002F\u002Fplatform.xiaomimimo.com\u002Fdocs\u002Fen-US\u002Fmodel-intro\u002Fmimo-v2.5-pro-ultraspeed\" target=\"_blank\" rel=\"noopener\">MiMo-V2.5-Pro\u003C\u002Fa> on every major line item, but the company frames that premium as a speed buy, not a capability buy.\u003C\u002Fp>\u003Cp>For teams shipping products where latency is visible to users, that premium can make sense. A support assistant that answers in a blink feels different from one that pauses long enough to break the flow. A trading signal, a fraud check, or a code completion prompt also has a shelf life measured in seconds, sometimes milliseconds.\u003C\u002Fp>\u003Cblockquote>“When breaking news drops, the model analyzes market impact and generates trading signals within milliseconds — closing the decision loop before the market moves.”\u003C\u002Fblockquote>\u003Cp>That line comes from Xiaomi’s own “Recommended Scenarios” section, and it tells you exactly who this product is for: people building systems where delay is expensive. The company also points to real-time risk control, scientific research, and coding assistance as target use cases.\u003C\u002Fp>\u003Ch2>The technical trick behind the speed claim\u003C\u002Fh2>\u003Cp>Xiaomi says the speed jump comes from a mix of algorithm and system changes, not from custom silicon. That is a big claim, because the industry often assumes you need specialized hardware to move a model this large that fast.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781129878888-zi9z.png\" alt=\"Xiaomi MiMo pushes 1T model to 1000 tokens\u002Fs\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>The page names four pieces of the stack that matter: \u003Ca href=\"https:\u002F\u002Fplatform.xiaomimimo.com\u002Fdocs\u002Fen-US\u002Fmodel-intro\u002Fmimo-v2.5-pro-ultraspeed\" target=\"_blank\" rel=\"noopener\">FP4 mixed-precision quantization\u003C\u002Fa>, DFlash speculative decoding, TileRT system-level optimization, and heterogeneous pipeline collaboration. In plain English, Xiaomi is squeezing more work out of the \u003Ca href=\"\u002Ftag\u002Fgpu\">GPU\u003C\u002Fa> by shrinking some weights, predicting in blocks, keeping kernels resident longer, and splitting communication from compute more carefully.\u003C\u002Fp>\u003Cul>\u003Cli>FP4 quantization applies only to MoE experts while other parts keep original precision.\u003C\u002Fli>\u003Cli>DFlash uses block-level masked parallel prediction instead of classic autoregressive drafting.\u003C\u002Fli>\u003Cli>TileRT keeps the compute pipeline resident on the GPU.\u003C\u002Fli>\u003Cli>The company says the design breaks through 1000 tokens\u002Fs “without requiring custom silicon.”\u003C\u002Fli>\u003C\u002Ful>\u003Cp>That last point is the one to watch. If Xiaomi can keep this experience stable under real traffic, it gives the company a clean pitch against vendors that talk about model quality while quietly accepting slower \u003Ca href=\"\u002Ftag\u002Finference\">inference\u003C\u002Fa> as the cost of doing business.\u003C\u002Fp>\u003Ch2>How it compares with the standard Pro tier\u003C\u002Fh2>\u003Cp>The comparison with \u003Ca href=\"https:\u002F\u002Fplatform.xiaomimimo.com\u002Fdocs\u002Fen-US\u002Fmodel-intro\u002Fmimo-v2.5-pro-ultraspeed\" target=\"_blank\" rel=\"noopener\">MiMo-V2.5-Pro\u003C\u002Fa> is stark. UltraSpeed is priced at ¥18 per million output tokens, while Pro is ¥6. On the input side, cache hits are ¥0.075 versus ¥0.025, and cache misses are ¥9 versus ¥3.\u003C\u002Fp>\u003Cp>USD pricing shows the same pattern. UltraSpeed output is listed at $2.61 per million tokens, compared with $0.87 for Pro. The input cache miss price is $1.305 versus $0.435. If you are running a high-volume app, those differences add up quickly, so the decision is less about whether UltraSpeed is better and more about whether the latency gain pays for itself.\u003C\u002Fp>\u003Cul>\u003Cli>UltraSpeed output TPS: 500 to 1000.\u003C\u002Fli>\u003Cli>Pro output TPS: 50 to 100.\u003C\u002Fli>\u003Cli>UltraSpeed output cost in CNY: ¥18 per million tokens.\u003C\u002Fli>\u003Cli>Pro output cost in CNY: ¥6 per million tokens.\u003C\u002Fli>\u003C\u002Ful>\u003Cp>That 5x to 10x speed gap is the real headline. Xiaomi is not selling a slightly faster model. It is selling a different operating mode for teams that want the answer to arrive before the user has time to notice the model thinking.\u003C\u002Fp>\u003Ch2>What developers should do before June 2026\u003C\u002Fh2>\u003Cp>The migration notice is easy to ignore until it becomes a production issue. Xiaomi says \u003Ca href=\"https:\u002F\u002Fplatform.xiaomimimo.com\u002Fdocs\u002Fen-US\u002Fmodel-intro\u002Fmimo-v2.5-pro-ultraspeed\" target=\"_blank\" rel=\"noopener\">MiMo-V2-Pro\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Fplatform.xiaomimimo.com\u002Fdocs\u002Fen-US\u002Fmodel-intro\u002Fmimo-v2.5-pro-ultraspeed\" target=\"_blank\" rel=\"noopener\">MiMo-V2-Pro Omni\u003C\u002Fa> will auto-route to V2.5 on June 1, 2026 at 00:00 GMT+8, and the legacy names will be fully deprecated by June 30.\u003C\u002Fp>\u003Cp>That gives developers a short runway to test pricing, throughput, and any prompt or tool-calling differences before the old endpoints disappear. If you are already using the platform, the sensible move is to pin a migration sprint now, not after the routing change is live.\u003C\u002Fp>\u003Cp>For teams evaluating the model for the first time, the practical question is simple: do you need a large model that feels instant, or do you need the cheapest possible token bill? Xiaomi is clearly betting that a slice of the market will pay for speed, and the June 2026 cutoff means the company wants everyone on the newer family anyway.\u003C\u002Fp>\u003Cp>The next interesting test is whether this 1T, 1000 tokens\u002Fs claim holds up outside Xiaomi’s demo flow. If it does, the company will have something rare in \u003Ca href=\"\u002Ftag\u002Fenterprise-ai\">enterprise AI\u003C\u002Fa>: a speed story that is tied to real pricing, a real migration deadline, and a model family developers can actually plan around.\u003C\u002Fp>","Xiaomi’s MiMo-V2.5-Pro-UltraSpeed pairs a 1T model with up to 1000 tokens\u002Fs and new pricing before legacy models retire.","platform.xiaomimimo.com","https:\u002F\u002Fplatform.xiaomimimo.com\u002Fdocs\u002Fen-US\u002Fmodel-intro\u002Fmimo-v2.5-pro-ultraspeed",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781129885712-1m6x.png","model-release","en","19af5701-87e3-4774-be7a-8aebcbeef2a5",[17,18,19,20,21],"Xiaomi MiMo","MiMo-V2.5-Pro-UltraSpeed","token pricing","inference speed","model migration",[23,24,25],"MiMo-V2.5-Pro-UltraSpeed is a 1T model with claimed output speeds up to 1000 tokens\u002Fs.","Xiaomi is raising prices for speed while planning to deprecate MiMo-V2-Pro and Omni by June 30, 2026.","The biggest test is whether the UltraSpeed mode keeps its latency gains under real developer traffic.",0,"2026-06-10T22:17:35.756211+00:00","2026-06-10T22:17:35.75+00:00","1bae1133-d241-4581-9332-fbf39690c319",{"tags":31,"relatedLang":42,"relatedPosts":46},[32,34,36,38,40],{"name":18,"slug":33},"mimo-v25-pro-ultraspeed",{"name":20,"slug":35},"inference-speed",{"name":21,"slug":37},"model-migration",{"name":17,"slug":39},"xiaomi-mimo",{"name":19,"slug":41},"token-pricing",{"id":15,"slug":43,"title":44,"language":45},"xiaomi-mimo-1t-model-1000-tokens-per-second-zh","小米 MiMo 把 1T 模型推到 1000 tokens\u002Fs","zh",[47,53,59,65,71,77],{"id":48,"slug":49,"title":50,"cover_image":51,"image_url":51,"created_at":52,"category":13},"ba5b0d8e-5854-4bf8-b26a-98dc46cebfdb","claude-mythos-5-5000-en","Claude Mythos 5发布：5000万行代码一天迁移","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781148787938-27wa.png","2026-06-11T03:32:40.961698+00:00",{"id":54,"slug":55,"title":56,"cover_image":57,"image_url":57,"created_at":58,"category":13},"a1d8f44e-7017-4a26-b745-90e394368e59","claude-fable-5-quiet-ai-release-week-en","Claude Fable 5 leads a quiet AI release week","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781143385127-g0i2.png","2026-06-11T02:02:39.433393+00:00",{"id":60,"slug":61,"title":62,"cover_image":63,"image_url":63,"created_at":64,"category":13},"fcc083c3-dad0-40d7-8ed4-6d89bf1ae3f9","mistral-model-lineup-specialization-beats-giant-model-en","Mistral’s model lineup proves specialization beats one giant model","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781140679549-zq0x.png","2026-06-11T01:17:28.761627+00:00",{"id":66,"slug":67,"title":68,"cover_image":69,"image_url":69,"created_at":70,"category":13},"5087c618-81f0-44cf-b851-933b509f28ce","google-gemini-latest-update-maps-en","Google Gemini’s latest update centers on Maps","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781119072999-p0wf.png","2026-06-10T19:17:28.002681+00:00",{"id":72,"slug":73,"title":74,"cover_image":75,"image_url":75,"created_at":76,"category":13},"40029ff3-361e-45f8-9641-9b6b79d9ff0c","ideogram-4-0-comfyui-first-test-en","Ideogram 4.0 在 ComfyUI 里的首测表现","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781118196304-kz61.png","2026-06-10T19:02:33.786785+00:00",{"id":78,"slug":79,"title":80,"cover_image":81,"image_url":81,"created_at":82,"category":13},"318ca45b-7063-4277-a810-80668c1907fe","chatgpt-adult-mode-paused-may-2026-en","ChatGPT Adult Mode Is Still Paused in May 2026","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781105578095-r0qo.png","2026-06-10T15:32:26.489231+00:00",[84,89,94,99,104,109,114,119,124,129],{"id":85,"slug":86,"title":87,"created_at":88},"d4cffde7-9b50-4cc7-bb68-8bc9e3b15477","nvidia-rubin-ai-supercomputer-en","NVIDIA Unveils Rubin: A Leap in AI Supercomputing","2026-03-25T16:24:35.155565+00:00",{"id":90,"slug":91,"title":92,"created_at":93},"eab919b9-fbac-4048-89fc-afad6749ccef","google-gemini-ai-innovations-2026-en","Google's AI Leap with Gemini Innovations in 2026","2026-03-25T16:27:18.841838+00:00",{"id":95,"slug":96,"title":97,"created_at":98},"5f5cfc67-3384-4816-a8f6-19e44d90113d","gap-google-gemini-ai-checkout-en","Gap Teams Up with Google Gemini for AI-Driven Checkout","2026-03-25T16:27:46.483272+00:00",{"id":100,"slug":101,"title":102,"created_at":103},"f6d04567-47f6-49ec-804c-52e61ab91225","ai-model-release-wave-march-2026-en","Navigating the AI Model Release Wave of March 2026","2026-03-25T16:28:45.409716+00:00",{"id":105,"slug":106,"title":107,"created_at":108},"895c150c-569e-4fdf-939d-dade785c990e","small-language-models-transform-ai-en","Small Language Models: Llama 3.2 and Phi-3 Transform AI","2026-03-25T16:30:26.688313+00:00",{"id":110,"slug":111,"title":112,"created_at":113},"38eb1d26-d961-4fd3-ae12-9c4089680f5f","midjourney-v8-alpha-features-pricing-en","Midjourney V8 Alpha: A Deep Dive into Its Features and Pricing","2026-03-26T01:25:36.387587+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"bf36bb9e-3444-4fb8-ab19-0df6bc9d8271","rag-2026-indispensable-ai-bridge-en","RAG in 2026: The Indispensable AI Bridge","2026-03-26T01:28:34.472046+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"60881d6d-2310-44ef-b1fb-7f98e9dd2f0e","xiaomi-mimo-trio-agents-robots-voice-en","Xiaomi’s MiMo trio targets agents, robots, and voice","2026-03-28T03:05:08.899895+00:00",{"id":125,"slug":126,"title":127,"created_at":128},"f063d8d1-41d1-4de4-8ebc-6c40511b9369","xiaomi-mimo-v2-pro-1t-moe-agents-en","Xiaomi MiMo-V2-Pro: 1T MoE Model for Agents","2026-03-28T03:06:19.238032+00:00",{"id":130,"slug":131,"title":132,"created_at":133},"a1379e9a-6785-4ff5-9b0a-8cff55f8264f","cursor-composer-2-started-from-kimi-en","Cursor’s Composer 2 started from Kimi","2026-03-28T03:11:59.132398+00:00"]