Back to home

Tag

long context

Long context refers to an LLM’s ability to keep and use very large histories in one pass, shaping memory design, retrieval, fast-weight updates, and stable reasoning. It shows up in 1M-2M token windows, state-space memory, TTT, and agent workflows.

14 articles

Kimi’s long-context push keeps getting bigger
Model Releases/Jun 24

Kimi’s long-context push keeps getting bigger

Moonshot AI’s Kimi chatbot keeps expanding context, agents, and model size, with Kimi K2.5 arriving in January 2026.

Kimi K2.6 turns agents into a swarm
AI Agent/Jun 19

Kimi K2.6 turns agents into a swarm

Kimi K2.6 is an open-source multimodal agent model for long coding runs, UI generation, and swarm-style task orchestration.

Self-host MiniMax M3 on GPU cloud
Model Releases/Jun 18

Self-host MiniMax M3 on GPU cloud

MiniMax M3 brings 229.9B MoE weights, 1M context, and multimodal output, but it needs serious GPU memory to run.

Gemma 4 brings 256K context to open models
Model Releases/Jun 17

Gemma 4 brings 256K context to open models

Google’s Gemma 4 adds text, image, and audio input, plus up to 256K context and five model sizes for local or server use.

MiniMax M3 adds 1M-token coding power
Model Releases/Jun 13

MiniMax M3 adds 1M-token coding power

MiniMax M3 brings coding and agent features, a 1 million-token context window, and multimodal input to the company’s flagship model.

2026 LLM paper lists are a better research tool than feeds
Research/Jun 12

2026 LLM paper lists are a better research tool than feeds

Curated LLM paper lists beat raw feeds because they turn scattered research into usable context.

Best Kimi Models in 2026: K2.5 vs K2 Thinking
Model Releases/Jun 7

Best Kimi Models in 2026: K2.5 vs K2 Thinking

Kimi K2.5 leads Moonshot AI’s 2026 lineup with 256K context, 1T parameters, Agent Swarm Mode, and low API pricing.

Gemini turns Google’s AI stack into one app
Tools & Apps/Jun 6

Gemini turns Google’s AI stack into one app

A developer’s breakdown of Gemini’s rollout, model tiers, and why Google folded search, app, and Vertex AI into one AI surface.

MiniMax-M1 brings 1M-token open reasoning model
Model Releases/May 15

MiniMax-M1 brings 1M-token open reasoning model

MiniMax released M1, an open-source reasoning model with 1M-token context, 80k output, and low-cost API pricing.

Sessa: Attention and State-Space Memory for Long Context
Research/Apr 21

Sessa: Attention and State-Space Memory for Long Context

Sessa mixes attention with recurrent state-space feedback to improve long-context recall, with power-law memory tails and strong benchmark results.

In-Place TTT Lets LLMs Adapt at Inference
Research/Apr 8

In-Place TTT Lets LLMs Adapt at Inference

A new test-time training setup lets LLMs update fast weights in place, aiming for better long-context adaptation without full retraining.

Grok 4.20: xAI's new flagship model explained
Model Releases/Apr 3

Grok 4.20: xAI's new flagship model explained

xAI’s Grok 4.20 adds a 2M-token context window, multi-agent reasoning, and API pricing from $2 per million input tokens.

Gemini 3.1 Pro: Google’s new top model in numbers
Model Releases/Apr 3

Gemini 3.1 Pro: Google’s new top model in numbers

Gemini 3.1 Pro posts 77.1% on ARC-AGI-2, 94.3% on GPQA Diamond, and a 1M-token context window, while keeping Gemini 3 pricing.

Universal YOCO aims to scale depth without cache bloat
Research/Apr 2

Universal YOCO aims to scale depth without cache bloat

YOCO-U mixes recursive computation with efficient attention to scale LLM depth while keeping inference overhead and KV cache growth in check.