Tag
long context
Long context refers to an LLM’s ability to keep and use very large histories in one pass, shaping memory design, retrieval, fast-weight updates, and stable reasoning. It shows up in 1M-2M token windows, state-space memory, TTT, and agent workflows.
14 articles

Kimi’s long-context push keeps getting bigger
Moonshot AI’s Kimi chatbot keeps expanding context, agents, and model size, with Kimi K2.5 arriving in January 2026.

Kimi K2.6 turns agents into a swarm
Kimi K2.6 is an open-source multimodal agent model for long coding runs, UI generation, and swarm-style task orchestration.

Self-host MiniMax M3 on GPU cloud
MiniMax M3 brings 229.9B MoE weights, 1M context, and multimodal output, but it needs serious GPU memory to run.

Gemma 4 brings 256K context to open models
Google’s Gemma 4 adds text, image, and audio input, plus up to 256K context and five model sizes for local or server use.

MiniMax M3 adds 1M-token coding power
MiniMax M3 brings coding and agent features, a 1 million-token context window, and multimodal input to the company’s flagship model.

2026 LLM paper lists are a better research tool than feeds
Curated LLM paper lists beat raw feeds because they turn scattered research into usable context.

Best Kimi Models in 2026: K2.5 vs K2 Thinking
Kimi K2.5 leads Moonshot AI’s 2026 lineup with 256K context, 1T parameters, Agent Swarm Mode, and low API pricing.

Gemini turns Google’s AI stack into one app
A developer’s breakdown of Gemini’s rollout, model tiers, and why Google folded search, app, and Vertex AI into one AI surface.

MiniMax-M1 brings 1M-token open reasoning model
MiniMax released M1, an open-source reasoning model with 1M-token context, 80k output, and low-cost API pricing.

Sessa: Attention and State-Space Memory for Long Context
Sessa mixes attention with recurrent state-space feedback to improve long-context recall, with power-law memory tails and strong benchmark results.

In-Place TTT Lets LLMs Adapt at Inference
A new test-time training setup lets LLMs update fast weights in place, aiming for better long-context adaptation without full retraining.

Grok 4.20: xAI's new flagship model explained
xAI’s Grok 4.20 adds a 2M-token context window, multi-agent reasoning, and API pricing from $2 per million input tokens.

Gemini 3.1 Pro: Google’s new top model in numbers
Gemini 3.1 Pro posts 77.1% on ARC-AGI-2, 94.3% on GPQA Diamond, and a 1M-token context window, while keeping Gemini 3 pricing.

Universal YOCO aims to scale depth without cache bloat
YOCO-U mixes recursive computation with efficient attention to scale LLM depth while keeping inference overhead and KV cache growth in check.