Kimi’s long-context push keeps getting bigger

OraCore Editors

Back to home

[MODEL] June 24, 20267 min readOraCore Editors

Kimi’s long-context push keeps getting bigger

Moonshot AI’s Kimi chatbot keeps expanding context, agents, and model size, with Kimi K2.5 arriving in January 2026.

Moonshot AI long context agentic AI

Share LinkedIn

Kimi’s long-context push keeps getting bigger

Moonshot AI’s Kimi chatbot has grown from a long-context assistant into a family of large agentic models.

Kimi started in October 2023 as Moonshot AI’s answer to the long-context problem, and it got attention fast because the first public version could handle 128,000 tokens. By January 2026, the line had expanded into Kimi K2.5, a 1 trillion parameter mixture-of-experts model with 32 billion active parameters.

Version	Release	Key number	Why it matters
Kimi chatbot	November 2023	128,000 tokens	First public release with ultra-long context
Kimi Explore Edition	October 2024	36 million+ MAU	Search-driven edition reached mass usage
Kimi K2	July 2025	1 trillion parameters, 32 billion active	Open-weight model with strong coding results
Kimi K2.5	January 2026	1 trillion parameters, 32 billion active	Added multimodal and agentic features

Moonshot AI built Kimi around long context first

Get the latest AI news in your inbox

Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.

No spam. Unsubscribe at any time.

Moonshot AI was founded in March 2023, and Kimi arrived later that year as a chatbot built for very long inputs. That focus mattered because most assistants in 2023 still struggled once a conversation or document got too large.

The original Kimi release in November 2023 supported 128,000 tokens of context, which made it one of the first public models to handle that scale. In practical terms, that meant users could paste long papers, codebases, or dense research notes without chopping them into tiny pieces.

Moonshot pushed that idea further in March 2024 with a beta update that supported a 2 million character context window. Then in July 2024, the company opened public beta for context caching, a feature aimed at making repeated long prompts cheaper and faster to process.

October 2023: closed beta begins
November 2023: public release with 128,000 tokens
March 2024: 2 million character context beta
July 2024: context caching enters public beta

The product shifted from chat to agent behavior

By late 2024, Kimi was moving past plain question answering. On 11 October 2024, Moonshot AI launched Kimi Explore Edition, which added autonomous search features. The company later said monthly active users passed 36 million, a useful sign that the product had broken out of niche AI-tinkerer territory.

That same year, Moonshot also began internal testing of video generation. The pattern is pretty clear: Kimi was no longer being positioned as a chat box that summarizes text. It was becoming a tool that searches, plans, drafts, and eventually creates multi-step outputs.

“We believe the best model is the one that can think, use tools, and solve real problems.” — Yang Zhilin, Moonshot AI

The quote above captures Moonshot’s direction well. Yang Zhilin has repeatedly framed Kimi as a system for reasoning and long-horizon work, not a novelty chatbot. That matters because the company’s releases keep adding agent-like behavior instead of only chasing benchmark chatter.

Kimi K1.5, K2, and K2.5 show the pace of the model line

The release cadence from 2025 into 2026 is the most revealing part of Kimi’s story. On 20 January 2025, Moonshot AI released Kimi K1.5 and claimed it matched OpenAI o1 on mathematics, coding, and multimodal reasoning. In April 2025, the company followed with Kimi-VL, a 16 billion parameter open-source mixture-of-experts model with 3 billion active parameters.

June 2025 brought Kimi-Dev, a 72B coding model based on Qwen2.5-72B that hit state-of-the-art results among open-source models on SWE-bench Verified. Moonshot also launched Kimi-Researcher, an autonomous research agent available through the app and website.

Then came the bigger jump. In July 2025, Moonshot AI released Kimi K2, a 1 trillion parameter MoE model with 32 billion active parameters, open sourced under a modified MIT license. In September 2025, the updated Kimi-K2-Instruct-0905 expanded context from 128K to 256K tokens and improved coding performance. By January 2026, Kimi K2.5 added multimodal vision and language understanding plus instant and thinking modes.

Kimi K1.5: January 2025
Kimi K2: July 2025, 1T parameters, 32B active
Kimi-K2-Instruct-0905: September 2025, 256K context
Kimi K2.5: January 2026, multimodal and agentic

Kimi’s attention trick matters more than raw size

One of the more interesting details in the line is Kimi Linear, released in October 2025. That model used Kimi Delta Attention, or KDA, which cuts memory use and speeds up generation when context windows get long. That is exactly the kind of engineering choice that matters once a model is expected to process huge documents or multi-step tasks.

Here is the practical comparison: bigger models grab headlines, but attention efficiency decides whether long-context use is affordable. A model with 256K tokens that burns less memory can run more conversations, handle more documents, and keep latency lower than a model that simply throws more parameters at the problem.

Kimi’s model family now covers several distinct use cases rather than one chatbot experience. The public line includes general chat, research, coding, and agentic task execution, which makes it closer to a product suite than a single assistant.

128K tokens in the first public Kimi release
256K tokens in Kimi-K2-Instruct-0905
1 million rows of input data for OK Computer
36 million+ monthly active users for Kimi Explore Edition

That spread also hints at Moonshot’s strategy. Instead of trying to win on one benchmark or one interface, the company keeps widening the product surface: long context for heavy reading, agent modes for task execution, and open-weight releases for developers who want to inspect or adapt the models.

What Kimi says about China’s AI race

Kimi’s story is also a useful snapshot of how fast Chinese AI labs are moving on product design. Moonshot AI is not just shipping a chatbot; it is iterating on model families, agent workflows, and licensing choices at a pace that keeps pressure on bigger Western labs.

The open-weight releases matter here. A model like Kimi K2 under a modified MIT license gives developers and researchers a path to test, fine-tune, and compare without waiting for a closed API to expose every feature. That makes the ecosystem around Kimi more active than a simple consumer app would be.

For developers, the next question is less about whether Kimi can chat and more about where it is cheapest and most useful. If Moonshot keeps improving KDA, expanding context, and tightening agent reliability, Kimi could become the default choice for long-document work and coding-heavy workflows in teams that want open models with strong throughput.

The real test will be whether Kimi K2.7 Code and the next research agent can keep that pace without turning the product into a pile of overlapping modes. If Moonshot can keep one clean interface while the models underneath get more capable, Kimi will stay interesting for reasons that go beyond model size.

// Related Articles

Kimi’s long-context push keeps getting bigger

Moonshot AI built Kimi around long context first

Get the latest AI news in your inbox

The product shifted from chat to agent behavior

Kimi K1.5, K2, and K2.5 show the pace of the model line

Kimi’s attention trick matters more than raw size

What Kimi says about China’s AI race

Midjourney Medical’s 60-Second Body Scan Claim

GLM-5.2开源：1M上下文冲刺长程任务

Apple pushes AI deeper into iPhone apps

Google launches Gemini 3.5 Live Translate audio model

Kimi K2.7-Code Adds HighSpeed Mode, Skips Benchmarks

Kimi K2.7: What Changed and How to Run It