compare.eyebrow

compare.title

compare.subtitle

compare.statProviders

10

compare.statOpenSource

9

compare.statContext

10M

23 models
Model Provider Released Context Pricing ($/1M) Arena ELO Coding Reasoning Speed License
Claude Opus 4.6#1 Arena Hard Prompts & Coding; 128K max output
Anthropic2026-021M$5/$25156180.8% SWE-bench65.4% Terminal-Bench~40 t/s t/s Closed
GPT-5.4Unifies Codex + GPT; 1M context; built-in computer use
OpenAI2026-031M$3/$151560unknownunknown~50 t/s t/s Closed
Grok 4Top-5 Arena; strong reasoning & real-time X data
xAI2026-01256K$5/$251530unknownunknown~45 t/s t/s Closed
DeepSeek R1671B MoE (37B active); MIT license; distilled variants available
DeepSeek2025-01128K$0.55/$2.191500unknown#1 Math & Coding Arena~45 t/s t/s OSS
Gemini 3.1 Flash Lite#3 Arena overall; #1 creative writing; ultra-fast
Google2026-031M$0.10/$0.401492unknownunknown~200 t/s t/s Closed
Gemini 2.5 ProThinking model; top WebDev Arena 1415; native multimodal
Google2025-031M$1.25/$10147075.6% LiveCodeBench84.6% GPQA Diamond~60 t/s t/s Closed
Claude Sonnet 4.6Best value frontier; beats Opus 4.5 in 59% head-to-head
Anthropic2026-021M$3/$15144079.6% SWE-bench72.5% OSWorld~80 t/s t/s Closed
Qwen 3 235B235B MoE (22B active); Apache 2.0; strongest OSS competitive programming
Alibaba2025-04128K$0.86/$2142270.7% LiveCodeBench2056 CodeForces ELO~65 t/s t/s OSS
Mistral Large 3675B MoE (41B active); Apache 2.0; best cost-efficiency frontier
Mistral2025-12256K$0.5/$1.51418unknown43.9% GPQA Diamond~70 t/s t/s OSS
o3Strongest OpenAI reasoning model
OpenAI2025-04200K$10/$401390unknownunknown~30 t/s t/s Closed
Kimi K21T params; Agent Swarm (100 agents); Modified MIT
Moonshot2025-07128K$0.55/$2.2138065.8% SWE-bench60.2% BrowseComp~50 t/s t/s OSS
DeepSeek V3.2~90% GPT-5.4 quality at 1/50th cost; best value model
DeepSeek2026-02128K$0.28/$0.421380unknownunknown~80 t/s t/s OSS
Grok 3Strong math/science; now legacy (Grok 4 series launched)
xAI2025-02131K$3/$151370unknown93.3% AIME 2025~55 t/s t/s Closed
Claude Opus 4.5Major price cut from Opus 4; strong agentic coding
Anthropic2025-11200K$5/$25134980.9% SWE-benchunknown~35 t/s t/s Closed
GPT-4oLegacy but still available; superseded by GPT-5 family
OpenAI2024-05128K$2.5/$10134030.8% SWE-benchunknown~100 t/s t/s Closed
Gemini 2.5 FlashCheapest frontier model at scale
Google2025-031M$0.30/$2.501330unknownunknown~150 t/s t/s Closed
Claude Haiku 4.5Fastest Claude, cheapest tier
Anthropic2025-10200K$0.8/$41290unknownunknown~120 t/s t/s Closed
Claude Opus 4.7Opus 4.7 features a new tokenizer that inflates token counts by 35-45%.
Anthropic128KN/A t/s Closed
Devstral 2Cheapest agentic coding model; 256K context
Mistral2026-01256K$0.05/$0.22N/Aunknownunknown~100 t/s t/s OSS
Llama 4 Maverick400B MoE (17B active); strong multimodal; open weights
Meta2025-041M$0.15/$0.6N/Aunknownunknown~60 t/s t/s OSS
MiMo V2Free coding model; 256K context; open weights
Xiaomi2026-02256KFreeN/Aunknownunknown~70 t/s t/s OSS
Muse SparkMuse Spark achieves its reasoning capabilities using over an order of magnitude less compute than Llama 4 Maverick.
MetaN/AN/A t/s Closed
Llama 4 Scout10M context industry record; 109B MoE (17B active)
Meta2025-0410M$0.08/$0.3N/Aunknownunknown~90 t/s t/s OSS
Arena ELO 1380+ Arena ELO 1350–1379 Arena ELO <1350 Gold = best in column

Benchmark figures are approximate and sourced from public leaderboards (LMSYS Chatbot Arena, official docs). Pricing shown as input$/output$ per 1M tokens. Speed is estimated tokens/sec and varies by provider. Data auto-refreshed from database.