Tag
mixture-of-experts
9 articles

Kimi’s long-context push keeps getting bigger
Moonshot AI’s Kimi chatbot keeps expanding context, agents, and model size, with Kimi K2.5 arriving in January 2026.

Kimi K2.7-Code Adds HighSpeed Mode, Skips Benchmarks
Moonshot’s Kimi K2.7-Code adds a faster mode and lower token use, but only Moonshot’s own benchmarks back the claims.

Self-host MiniMax M3 on GPU cloud
MiniMax M3 brings 229.9B MoE weights, 1M context, and multimodal output, but it needs serious GPU memory to run.

MiMo-V2-Flash hits top open-source SWE-bench scores
Xiaomi’s MiMo-V2-Flash tops open-source SWE-bench scores while OpenRouter lists it at $0.10/$0.30 per 1M tokens.

NVIDIA Nemotron 3 Ultra proves open models can still compete
Nemotron 3 Ultra shows that open-weight models can still match top rivals while running far faster.

HANDOFF makes humanoid control more planner-friendly
HANDOFF gives humanoid robots a compact control interface and distills three specialists into one controller.

UniPool shares MoE experts across layers
UniPool replaces per-layer MoE experts with one shared pool, cutting redundancy and improving validation loss in five LLaMA-scale models.

Sebastian Raschka’s LLM Architecture Gallery
Raschka’s gallery compares GPT-2, Llama 3, OLMo 2, DeepSeek, and Qwen stacks with exact layer, cache, and attention data.

Cursor Composer 2 Bets on Agentic Coding
Cursor’s Composer 2 posts 61.3 on CursorBench and 61.7 on Terminal-Bench 2.0, with pricing aimed at high-volume coding teams.