Compare
LLM Comparison 2026
Compare leading models across context, pricing, Arena, coding, and reasoning in a single sortable surface.
Providers covered
10
Open-source models
9
Largest context
10M
23 models
| Model ⇅ | Provider ⇅ | Released ⇅ | Context ⇅ | Pricing ($/1M) | Arena ELO ▼ | Coding ⇅ | Reasoning ⇅ | Speed | License |
|---|---|---|---|---|---|---|---|---|---|
Claude Opus 4.6#1 Arena Hard Prompts & Coding; 128K max output | Anthropic | 2026-02 | 1M | $5/$25 | 1561 | 80.8% SWE-bench | 65.4% Terminal-Bench | ~40 t/s t/s | Closed |
GPT-5.4Unifies Codex + GPT; 1M context; built-in computer use | OpenAI | 2026-03 | 1M | $3/$15 | 1560 | unknown | unknown | ~50 t/s t/s | Closed |
Grok 4Top-5 Arena; strong reasoning & real-time X data | xAI | 2026-01 | 256K | $5/$25 | 1530 | unknown | unknown | ~45 t/s t/s | Closed |
DeepSeek R1671B MoE (37B active); MIT license; distilled variants available | DeepSeek | 2025-01 | 128K | $0.55/$2.19 | 1500 | unknown | #1 Math & Coding Arena | ~45 t/s t/s | OSS |
Gemini 3.1 Flash Lite#3 Arena overall; #1 creative writing; ultra-fast | 2026-03 | 1M | $0.10/$0.40 | 1492 | unknown | unknown | ~200 t/s t/s | Closed | |
Gemini 2.5 ProThinking model; top WebDev Arena 1415; native multimodal | 2025-03 | 1M | $1.25/$10 | 1470 | 75.6% LiveCodeBench | 84.6% GPQA Diamond | ~60 t/s t/s | Closed | |
Claude Sonnet 4.6Best value frontier; beats Opus 4.5 in 59% head-to-head | Anthropic | 2026-02 | 1M | $3/$15 | 1440 | 79.6% SWE-bench | 72.5% OSWorld | ~80 t/s t/s | Closed |
Qwen 3 235B235B MoE (22B active); Apache 2.0; strongest OSS competitive programming | Alibaba | 2025-04 | 128K | $0.86/$2 | 1422 | 70.7% LiveCodeBench | 2056 CodeForces ELO | ~65 t/s t/s | OSS |
Mistral Large 3675B MoE (41B active); Apache 2.0; best cost-efficiency frontier | Mistral | 2025-12 | 256K | $0.5/$1.5 | 1418 | unknown | 43.9% GPQA Diamond | ~70 t/s t/s | OSS |
o3Strongest OpenAI reasoning model | OpenAI | 2025-04 | 200K | $10/$40 | 1390 | unknown | unknown | ~30 t/s t/s | Closed |
Kimi K21T params; Agent Swarm (100 agents); Modified MIT | Moonshot | 2025-07 | 128K | $0.55/$2.2 | 1380 | 65.8% SWE-bench | 60.2% BrowseComp | ~50 t/s t/s | OSS |
DeepSeek V3.2~90% GPT-5.4 quality at 1/50th cost; best value model | DeepSeek | 2026-02 | 128K | $0.28/$0.42 | 1380 | unknown | unknown | ~80 t/s t/s | OSS |
Grok 3Strong math/science; now legacy (Grok 4 series launched) | xAI | 2025-02 | 131K | $3/$15 | 1370 | unknown | 93.3% AIME 2025 | ~55 t/s t/s | Closed |
Claude Opus 4.5Major price cut from Opus 4; strong agentic coding | Anthropic | 2025-11 | 200K | $5/$25 | 1349 | 80.9% SWE-bench | unknown | ~35 t/s t/s | Closed |
GPT-4oLegacy but still available; superseded by GPT-5 family | OpenAI | 2024-05 | 128K | $2.5/$10 | 1340 | 30.8% SWE-bench | unknown | ~100 t/s t/s | Closed |
Gemini 2.5 FlashCheapest frontier model at scale | 2025-03 | 1M | $0.30/$2.50 | 1330 | unknown | unknown | ~150 t/s t/s | Closed | |
Claude Haiku 4.5Fastest Claude, cheapest tier | Anthropic | 2025-10 | 200K | $0.8/$4 | 1290 | unknown | unknown | ~120 t/s t/s | Closed |
Claude Opus 4.7Opus 4.7 features a new tokenizer that inflates token counts by 35-45%. | Anthropic | 128K | N/A | t/s | Closed | ||||
Devstral 2Cheapest agentic coding model; 256K context | Mistral | 2026-01 | 256K | $0.05/$0.22 | N/A | unknown | unknown | ~100 t/s t/s | OSS |
Llama 4 Maverick400B MoE (17B active); strong multimodal; open weights | Meta | 2025-04 | 1M | $0.15/$0.6 | N/A | unknown | unknown | ~60 t/s t/s | OSS |
MiMo V2Free coding model; 256K context; open weights | Xiaomi | 2026-02 | 256K | Free | N/A | unknown | unknown | ~70 t/s t/s | OSS |
Muse SparkMuse Spark achieves its reasoning capabilities using over an order of magnitude less compute than Llama 4 Maverick. | Meta | N/A | N/A | t/s | Closed | ||||
Llama 4 Scout10M context industry record; 109B MoE (17B active) | Meta | 2025-04 | 10M | $0.08/$0.3 | N/A | unknown | unknown | ~90 t/s t/s | OSS |
Arena ELO 1380+ Arena ELO 1350–1379 Arena ELO <1350 Gold = best in column
Benchmark figures are approximate and sourced from public leaderboards (LMSYS Chatbot Arena, official docs). Pricing shown as input$/output$ per 1M tokens. Speed is estimated tokens/sec and varies by provider. Data auto-refreshed from database.