Tag

mixture-of-experts

9 articles

Kimi’s long-context push keeps getting bigger

Moonshot AI’s Kimi chatbot keeps expanding context, agents, and model size, with Kimi K2.5 arriving in January 2026.

Moonshot’s Kimi K2.7-Code adds a faster mode and lower token use, but only Moonshot’s own benchmarks back the claims.

MiniMax M3 brings 229.9B MoE weights, 1M context, and multimodal output, but it needs serious GPU memory to run.

Xiaomi’s MiMo-V2-Flash tops open-source SWE-bench scores while OpenRouter lists it at $0.10/$0.30 per 1M tokens.

Nemotron 3 Ultra shows that open-weight models can still match top rivals while running far faster.

HANDOFF gives humanoid robots a compact control interface and distills three specialists into one controller.

UniPool replaces per-layer MoE experts with one shared pool, cutting redundancy and improving validation loss in five LLaMA-scale models.

Raschka’s gallery compares GPT-2, Llama 3, OLMo 2, DeepSeek, and Qwen stacks with exact layer, cache, and attention data.

Cursor’s Composer 2 posts 61.3 on CursorBench and 61.7 on Terminal-Bench 2.0, with pricing aimed at high-volume coding teams.