Tag
multimodal models
7 articles

Three multimodal models now work in Claude Code
Three multimodal models now plug into Claude Code and other clients through OpenAI-style settings.

K2.6 turns Kimi into a better default
I break down Kimi K2, K2.5, and K2.6, then give you a copy-ready model choice template for real projects.

5 things to know about Meta’s Llama 3 rollout
5 things to know about Meta’s Llama 3 rollout in the US and EU, including model sizes, regional limits, and developer access.

IPT helps VLMs reason about hidden space
Imaginative Perception Tokens improve multimodal models’ ability to reason about unseen spatial structure.

ATLAS Makes Visual Reasoning Use One Token
ATLAS uses one discrete token for both agentic and latent visual reasoning, aiming to cut overhead without changing standard training.

Why OpenAI API pricing is a product strategy, not a footnote
OpenAI API pricing is a product strategy, not a footnote, and teams should treat it that way.

Xiaomi’s MiMo trio targets agents, robots, and voice
Xiaomi released three MiMo models for agents, multimodal tasks, and speech. MiMo-V2-Pro nears Claude Opus 4.6 on key benchmarks.