Compare

LLM Comparison 2026

Compare leading models across context, pricing, Arena, coding, and reasoning in a single sortable surface.

Providers covered

Open-source models

Largest context

10M

23 models

Model ⇅	Provider ⇅	Released ⇅	Context ⇅	Pricing ($/1M)	Arena ELO ▼	Coding ⇅	Reasoning ⇅	Speed	License
Claude Opus 4.6#1 Arena Hard Prompts & Coding; 128K max output	Anthropic	2026-02	1M	$5/$25	1561	80.8% SWE-bench	65.4% Terminal-Bench	~40 t/s t/s	Closed
GPT-5.4Unifies Codex + GPT; 1M context; built-in computer use	OpenAI	2026-03	1M	$3/$15	1560	unknown	unknown	~50 t/s t/s	Closed
Grok 4Top-5 Arena; strong reasoning & real-time X data	xAI	2026-01	256K	$5/$25	1530	unknown	unknown	~45 t/s t/s	Closed
DeepSeek R1671B MoE (37B active); MIT license; distilled variants available	DeepSeek	2025-01	128K	$0.55/$2.19	1500	unknown	#1 Math & Coding Arena	~45 t/s t/s	OSS
Gemini 3.1 Flash Lite#3 Arena overall; #1 creative writing; ultra-fast	Google	2026-03	1M	$0.10/$0.40	1492	unknown	unknown	~200 t/s t/s	Closed
Gemini 2.5 ProThinking model; top WebDev Arena 1415; native multimodal	Google	2025-03	1M	$1.25/$10	1470	75.6% LiveCodeBench	84.6% GPQA Diamond	~60 t/s t/s	Closed
Claude Sonnet 4.6Best value frontier; beats Opus 4.5 in 59% head-to-head	Anthropic	2026-02	1M	$3/$15	1440	79.6% SWE-bench	72.5% OSWorld	~80 t/s t/s	Closed
Qwen 3 235B235B MoE (22B active); Apache 2.0; strongest OSS competitive programming	Alibaba	2025-04	128K	$0.86/$2	1422	70.7% LiveCodeBench	2056 CodeForces ELO	~65 t/s t/s	OSS
Mistral Large 3675B MoE (41B active); Apache 2.0; best cost-efficiency frontier	Mistral	2025-12	256K	$0.5/$1.5	1418	unknown	43.9% GPQA Diamond	~70 t/s t/s	OSS
o3Strongest OpenAI reasoning model	OpenAI	2025-04	200K	$10/$40	1390	unknown	unknown	~30 t/s t/s	Closed
Kimi K21T params; Agent Swarm (100 agents); Modified MIT	Moonshot	2025-07	128K	$0.55/$2.2	1380	65.8% SWE-bench	60.2% BrowseComp	~50 t/s t/s	OSS
DeepSeek V3.2~90% GPT-5.4 quality at 1/50th cost; best value model	DeepSeek	2026-02	128K	$0.28/$0.42	1380	unknown	unknown	~80 t/s t/s	OSS
Grok 3Strong math/science; now legacy (Grok 4 series launched)	xAI	2025-02	131K	$3/$15	1370	unknown	93.3% AIME 2025	~55 t/s t/s	Closed
Claude Opus 4.5Major price cut from Opus 4; strong agentic coding	Anthropic	2025-11	200K	$5/$25	1349	80.9% SWE-bench	unknown	~35 t/s t/s	Closed
GPT-4oLegacy but still available; superseded by GPT-5 family	OpenAI	2024-05	128K	$2.5/$10	1340	30.8% SWE-bench	unknown	~100 t/s t/s	Closed
Gemini 2.5 FlashCheapest frontier model at scale	Google	2025-03	1M	$0.30/$2.50	1330	unknown	unknown	~150 t/s t/s	Closed
Claude Haiku 4.5Fastest Claude, cheapest tier	Anthropic	2025-10	200K	$0.8/$4	1290	unknown	unknown	~120 t/s t/s	Closed
Claude Opus 4.7Opus 4.7 features a new tokenizer that inflates token counts by 35-45%.	Anthropic		128K		N/A			t/s	Closed
Devstral 2Cheapest agentic coding model; 256K context	Mistral	2026-01	256K	$0.05/$0.22	N/A	unknown	unknown	~100 t/s t/s	OSS
Llama 4 Maverick400B MoE (17B active); strong multimodal; open weights	Meta	2025-04	1M	$0.15/$0.6	N/A	unknown	unknown	~60 t/s t/s	OSS
MiMo V2Free coding model; 256K context; open weights	Xiaomi	2026-02	256K	Free	N/A	unknown	unknown	~70 t/s t/s	OSS
Muse SparkMuse Spark achieves its reasoning capabilities using over an order of magnitude less compute than Llama 4 Maverick.	Meta		N/A		N/A			t/s	Closed
Llama 4 Scout10M context industry record; 109B MoE (17B active)	Meta	2025-04	10M	$0.08/$0.3	N/A	unknown	unknown	~90 t/s t/s	OSS

Arena ELO 1380+ Arena ELO 1350–1379 Arena ELO <1350 Gold = best in column

Benchmark figures are approximate and sourced from public leaderboards (LMSYS Chatbot Arena, official docs). Pricing shown as input$/output$ per 1M tokens. Speed is estimated tokens/sec and varies by provider. Data auto-refreshed from database.