Compare
LLM Comparison 2026
Compare Claude, GPT, Gemini, Llama, and Mistral side by side: context windows, pricing, Arena rank, coding and reasoning scores — sortable in one surface so you pick the right model fast.
Providers covered
13
Open-source models
15
Largest context
10M
41 models
| Model ⇅ | Provider ⇅ | Released ⇅ | Context ⇅ | Pricing ($/1M) | Arena ELO ▼ | Coding ⇅ | Reasoning ⇅ | Speed | License |
|---|---|---|---|---|---|---|---|---|---|
Claude Opus 4.7Opus 4.7 features a new tokenizer that inflates token counts by 35-45%. | Anthropic | 128K | 1567 | t/s | Closed | ||||
GPT-5.4Unifies Codex + GPT; 1M context; built-in computer use | OpenAI | 2026-03 | 1M | $3/$15 | 1560 | unknown | unknown | ~50 t/s t/s | Closed |
Claude Opus 4.6#1 Arena Hard Prompts & Coding; 128K max output | Anthropic | 2026-02 | 1M | $5/$25 | 1549 | 80.8% SWE-bench | 65.4% Terminal-Bench | ~40 t/s t/s | Closed |
Gemini 3.1 Flash Lite#3 Arena overall; #1 creative writing; ultra-fast | 2026-03 | 1M | $0.1/$0.4 | 1492 | unknown | unknown | ~200 t/s t/s | Closed | |
Claude Sonnet 4.6Best value frontier; beats Opus 4.5 in 59% head-to-head | Anthropic | 2026-02 | 1M | $3/$15 | 1440 | 79.6% SWE-bench | 72.5% OSWorld | ~80 t/s t/s | Closed |
Qwen 3 235B235B MoE (22B active); Apache 2.0; strongest OSS competitive programming | Alibaba | 2025-04 | 128K | $0.86/$2 | 1422 | 70.7% LiveCodeBench | 2056 CodeForces ELO | ~65 t/s t/s | OSS |
Mistral Large 3675B MoE (41B active); Apache 2.0; best cost-efficiency frontier | Mistral | 2025-12 | 256K | $0.5/$1.5 | 1418 | unknown | 43.9% GPQA Diamond | ~70 t/s t/s | OSS |
Claude Opus 4.5Major price cut from Opus 4; strong agentic coding | Anthropic | 2025-11 | 200K | $5/$25 | 1380 | 80.9% SWE-bench | unknown | ~35 t/s t/s | Closed |
Kimi K21T params; Agent Swarm (100 agents); Modified MIT | Moonshot | 2025-07 | 128K | $0.55/$2.2 | 1380 | 65.8% SWE-bench | 60.2% BrowseComp | ~50 t/s t/s | OSS |
DeepSeek V3.2~90% GPT-5.4 quality at 1/50th cost; best value model | DeepSeek | 2026-02 | 128K | $0.14/$0.42 | 1380 | unknown | unknown | ~80 t/s t/s | OSS |
DeepSeek R1671B MoE (37B active); MIT license; distilled variants available | DeepSeek | 2025-01 | 128K | $0.55/$2.19 | 1368 | unknown | #1 Math & Coding Arena | ~45 t/s t/s | OSS |
o3Strongest OpenAI reasoning model | OpenAI | 2025-04 | 200K | $10/$40 | 1365 | unknown | unknown | ~30 t/s t/s | Closed |
Gemini 2.5 FlashCheapest frontier model at scale | 2025-03 | 1M | $0.30/$2.50 | 1362 | unknown | unknown | ~150 t/s t/s | Closed | |
Grok 4Top-5 Arena; strong reasoning & real-time X data | xAI | 2026-01 | 256K | $5/$25 | 1340 | unknown | unknown | ~45 t/s t/s | Closed |
Gemini 2.5 ProThinking model; top WebDev Arena 1415; native multimodal | 2025-03 | 1M | $1.25/$10 | 1310 | 75.6% LiveCodeBench | 84.6% GPQA Diamond | ~60 t/s t/s | Closed | |
Grok 3Strong math/science; now legacy (Grok 4 series launched) | xAI | 2025-02 | 131K | $3/$15 | 1298 | unknown | 93.3% AIME 2025 | ~55 t/s t/s | Closed |
Claude Haiku 4.5Fastest Claude, cheapest tier | Anthropic | 2025-10 | 200K | $0.8/$4 | 1290 | unknown | unknown | ~120 t/s t/s | Closed |
GPT-4oLegacy but still available; superseded by GPT-5 family | OpenAI | 2024-05 | 128K | $2.5/$10 | 1285 | 30.8% SWE-bench | unknown | ~100 t/s t/s | Closed |
GPT-5.6 SolFrontier preview model that reportedly leads on coding and cybersecurity benchmarks. | OpenAI | 128K | $0/$0 | N/A | t/s | Closed | |||
Gemma 4Google’s latest open model family entry surfaced in 2026 benchmark roundups for self-hosted use. | 128K | $0.00/$0.00 | N/A | t/s | OSS | ||||
GPT-5.3-CodexCoding-oriented OpenAI model referenced in June 2026 pricing coverage with strong developer-task performance. | OpenAI | 128K | $1.75/$14 | N/A | t/s | Closed | |||
Llama 4 Maverick400B MoE (17B active); strong multimodal; open weights | Meta | 2025-04 | 1M | $0.15/$0.60 | N/A | unknown | unknown | ~60 t/s t/s | OSS |
GPT-5.6 LunaFastest and lowest-cost GPT-5.6 tier, with performance near GPT-5.5 on several tests. | OpenAI | 128K | $0/$0 | N/A | t/s | Closed | |||
DeepSeek V4 FlashLow-cost DeepSeek variant cited in 2026 token-cost comparisons as a budget option for fast inference. | DeepSeek | 128K | $0.14/$0.28 | N/A | t/s | OSS | |||
Devstral 2Cheapest agentic coding model; 256K context | Mistral | 2026-01 | 256K | $0.05/$0.22 | N/A | unknown | unknown | ~100 t/s t/s | OSS |
Gemini 3.2 FlashFaster than Gemini 3.1 Pro with improved performance. | 128K | N/A | t/s | Closed | |||||
MiMo V2Free coding model; 256K context; open weights | Xiaomi | 2026-02 | 256K | Free | N/A | unknown | unknown | ~70 t/s t/s | OSS |
Muse SparkMuse Spark achieves its reasoning capabilities using over an order of magnitude less compute than Llama 4 Maverick. | Meta | N/A | N/A | t/s | Closed | ||||
Gemini 3.5 FlashGA on May 19, 2026; faster than prior frontier models and strong on coding and agentic benchmarks. | 1M | $1.50/$9 | N/A | t/s | Closed | ||||
GPT-4.1 NanoUltra-low-cost OpenAI model highlighted in 2026 pricing comparisons for lightweight agent and utility workloads. | OpenAI | 128K | $0.10/$0.40 | N/A | t/s | Closed | |||
Kimi K2.7 CodeMoonshot’s new coding-focused MoE model with a large context window and competitive pricing. | Moonshot AI | 256K | $0.60/$2.40 | N/A | t/s | Closed | |||
Claude Fable 5Anthropic’s newly released flagship Claude model, positioned for high-end reasoning and long-context agentic work. | Anthropic | 1M | $10/$50 | N/A | t/s | Closed | |||
Qwen 3.6Newer Qwen-family open model mentioned in 2026 open-source rankings for strong general-purpose performance. | Alibaba | 128K | $0.20/$0.80 | N/A | t/s | OSS | |||
GPT-5.5OpenAI's latest model, GPT-5.5, offers advanced capabilities for coding and complex tasks. | OpenAI | 128K | N/A | t/s | Closed | ||||
DeepSeek V4-ProDeepSeek V4-Pro offers a significant price reduction, making it one of the most cost-effective options in the market. | DeepSeek | 128K | N/A | t/s | Closed | ||||
DeepSeek V4A cost-focused DeepSeek release that recent leaderboards describe as a strong value option for reasoning and general use. | DeepSeek | 128K | $0.435/$0.87 | N/A | t/s | OSS | |||
Claude Opus 4.8Improves coding, reasoning, reliability, and agentic workflows while keeping standard API pricing unchanged from Opus 4.7. | Anthropic | 200K | $15/$75 | N/A | t/s | Closed | |||
GLM-5.2Open-weights coding-focused model released in June 2026, positioned for long-horizon autonomous engineering tasks. | Z.ai | 128K | $3/$80 | N/A | t/s | OSS | |||
Llama 4 Scout10M context industry record; 109B MoE (17B active) | Meta | 2025-04 | 10M | $0.08/$0.3 | N/A | unknown | unknown | ~90 t/s t/s | OSS |
MiniMax M3Open-weight multimodal model with a one-million-token context window and strong coding performance. | MiniMax | 1M | N/A | t/s | OSS | ||||
GPT-5.6 TerraLower-cost GPT-5.6 tier positioned to halve costs versus the top model. | OpenAI | 128K | $0/$0 | N/A | t/s | Closed |
Arena ELO 1380+ Arena ELO 1350–1379 Arena ELO <1350 Gold = best in column
Benchmark figures are approximate and sourced from public leaderboards (LMSYS Chatbot Arena, official docs). Pricing shown as input$/output$ per 1M tokens. Speed is estimated tokens/sec and varies by provider. Data auto-refreshed from database.