Compare

LLM Comparison 2026

Compare Claude, GPT, Gemini, Llama, and Mistral side by side: context windows, pricing, Arena rank, coding and reasoning scores — sortable in one surface so you pick the right model fast.

Providers covered

13

Open-source models

15

Largest context

10M

41 models
Model Provider Released Context Pricing ($/1M) Arena ELO Coding Reasoning Speed License
Claude Opus 4.7Opus 4.7 features a new tokenizer that inflates token counts by 35-45%.
Anthropic128K1567 t/s Closed
GPT-5.4Unifies Codex + GPT; 1M context; built-in computer use
OpenAI2026-031M$3/$151560unknownunknown~50 t/s t/s Closed
Claude Opus 4.6#1 Arena Hard Prompts & Coding; 128K max output
Anthropic2026-021M$5/$25154980.8% SWE-bench65.4% Terminal-Bench~40 t/s t/s Closed
Gemini 3.1 Flash Lite#3 Arena overall; #1 creative writing; ultra-fast
Google2026-031M$0.1/$0.41492unknownunknown~200 t/s t/s Closed
Claude Sonnet 4.6Best value frontier; beats Opus 4.5 in 59% head-to-head
Anthropic2026-021M$3/$15144079.6% SWE-bench72.5% OSWorld~80 t/s t/s Closed
Qwen 3 235B235B MoE (22B active); Apache 2.0; strongest OSS competitive programming
Alibaba2025-04128K$0.86/$2142270.7% LiveCodeBench2056 CodeForces ELO~65 t/s t/s OSS
Mistral Large 3675B MoE (41B active); Apache 2.0; best cost-efficiency frontier
Mistral2025-12256K$0.5/$1.51418unknown43.9% GPQA Diamond~70 t/s t/s OSS
Claude Opus 4.5Major price cut from Opus 4; strong agentic coding
Anthropic2025-11200K$5/$25138080.9% SWE-benchunknown~35 t/s t/s Closed
Kimi K21T params; Agent Swarm (100 agents); Modified MIT
Moonshot2025-07128K$0.55/$2.2138065.8% SWE-bench60.2% BrowseComp~50 t/s t/s OSS
DeepSeek V3.2~90% GPT-5.4 quality at 1/50th cost; best value model
DeepSeek2026-02128K$0.14/$0.421380unknownunknown~80 t/s t/s OSS
DeepSeek R1671B MoE (37B active); MIT license; distilled variants available
DeepSeek2025-01128K$0.55/$2.191368unknown#1 Math & Coding Arena~45 t/s t/s OSS
o3Strongest OpenAI reasoning model
OpenAI2025-04200K$10/$401365unknownunknown~30 t/s t/s Closed
Gemini 2.5 FlashCheapest frontier model at scale
Google2025-031M$0.30/$2.501362unknownunknown~150 t/s t/s Closed
Grok 4Top-5 Arena; strong reasoning & real-time X data
xAI2026-01256K$5/$251340unknownunknown~45 t/s t/s Closed
Gemini 2.5 ProThinking model; top WebDev Arena 1415; native multimodal
Google2025-031M$1.25/$10131075.6% LiveCodeBench84.6% GPQA Diamond~60 t/s t/s Closed
Grok 3Strong math/science; now legacy (Grok 4 series launched)
xAI2025-02131K$3/$151298unknown93.3% AIME 2025~55 t/s t/s Closed
Claude Haiku 4.5Fastest Claude, cheapest tier
Anthropic2025-10200K$0.8/$41290unknownunknown~120 t/s t/s Closed
GPT-4oLegacy but still available; superseded by GPT-5 family
OpenAI2024-05128K$2.5/$10128530.8% SWE-benchunknown~100 t/s t/s Closed
GPT-5.6 SolFrontier preview model that reportedly leads on coding and cybersecurity benchmarks.
OpenAI128K$0/$0N/A t/s Closed
Gemma 4Google’s latest open model family entry surfaced in 2026 benchmark roundups for self-hosted use.
Google128K$0.00/$0.00N/A t/s OSS
GPT-5.3-CodexCoding-oriented OpenAI model referenced in June 2026 pricing coverage with strong developer-task performance.
OpenAI128K$1.75/$14N/A t/s Closed
Llama 4 Maverick400B MoE (17B active); strong multimodal; open weights
Meta2025-041M$0.15/$0.60N/Aunknownunknown~60 t/s t/s OSS
GPT-5.6 LunaFastest and lowest-cost GPT-5.6 tier, with performance near GPT-5.5 on several tests.
OpenAI128K$0/$0N/A t/s Closed
DeepSeek V4 FlashLow-cost DeepSeek variant cited in 2026 token-cost comparisons as a budget option for fast inference.
DeepSeek128K$0.14/$0.28N/A t/s OSS
Devstral 2Cheapest agentic coding model; 256K context
Mistral2026-01256K$0.05/$0.22N/Aunknownunknown~100 t/s t/s OSS
Gemini 3.2 FlashFaster than Gemini 3.1 Pro with improved performance.
Google128KN/A t/s Closed
MiMo V2Free coding model; 256K context; open weights
Xiaomi2026-02256KFreeN/Aunknownunknown~70 t/s t/s OSS
Muse SparkMuse Spark achieves its reasoning capabilities using over an order of magnitude less compute than Llama 4 Maverick.
MetaN/AN/A t/s Closed
Gemini 3.5 FlashGA on May 19, 2026; faster than prior frontier models and strong on coding and agentic benchmarks.
Google1M$1.50/$9N/A t/s Closed
GPT-4.1 NanoUltra-low-cost OpenAI model highlighted in 2026 pricing comparisons for lightweight agent and utility workloads.
OpenAI128K$0.10/$0.40N/A t/s Closed
Kimi K2.7 CodeMoonshot’s new coding-focused MoE model with a large context window and competitive pricing.
Moonshot AI256K$0.60/$2.40N/A t/s Closed
Claude Fable 5Anthropic’s newly released flagship Claude model, positioned for high-end reasoning and long-context agentic work.
Anthropic1M$10/$50N/A t/s Closed
Qwen 3.6Newer Qwen-family open model mentioned in 2026 open-source rankings for strong general-purpose performance.
Alibaba128K$0.20/$0.80N/A t/s OSS
GPT-5.5OpenAI's latest model, GPT-5.5, offers advanced capabilities for coding and complex tasks.
OpenAI128KN/A t/s Closed
DeepSeek V4-ProDeepSeek V4-Pro offers a significant price reduction, making it one of the most cost-effective options in the market.
DeepSeek128KN/A t/s Closed
DeepSeek V4A cost-focused DeepSeek release that recent leaderboards describe as a strong value option for reasoning and general use.
DeepSeek128K$0.435/$0.87N/A t/s OSS
Claude Opus 4.8Improves coding, reasoning, reliability, and agentic workflows while keeping standard API pricing unchanged from Opus 4.7.
Anthropic200K$15/$75N/A t/s Closed
GLM-5.2Open-weights coding-focused model released in June 2026, positioned for long-horizon autonomous engineering tasks.
Z.ai128K$3/$80N/A t/s OSS
Llama 4 Scout10M context industry record; 109B MoE (17B active)
Meta2025-0410M$0.08/$0.3N/Aunknownunknown~90 t/s t/s OSS
MiniMax M3Open-weight multimodal model with a one-million-token context window and strong coding performance.
MiniMax1MN/A t/s OSS
GPT-5.6 TerraLower-cost GPT-5.6 tier positioned to halve costs versus the top model.
OpenAI128K$0/$0N/A t/s Closed
Arena ELO 1380+ Arena ELO 1350–1379 Arena ELO <1350 Gold = best in column

Benchmark figures are approximate and sourced from public leaderboards (LMSYS Chatbot Arena, official docs). Pricing shown as input$/output$ per 1M tokens. Speed is estimated tokens/sec and varies by provider. Data auto-refreshed from database.