[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-llm-leaderboard-2026-300-models-ranked-en":3,"article-related-llm-leaderboard-2026-300-models-ranked-en":30,"series-tools-4065ada8-125b-4286-85c5-85cfe7d6369a":83},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":22,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":29},"4065ada8-125b-4286-85c5-85cfe7d6369a","llm-leaderboard-2026-300-models-ranked-en","LLM Leaderboard 2026: 300+ Models Ranked","\u003Cp data-speakable=\"summary\">LLM Stats ranks 309 AI models by score, speed, and price.\u003C\u002Fp>\u003Cp>The new \u003Ca href=\"https:\u002F\u002Fllm-stats.com\u002Fleaderboards\u002Fllm-leaderboard\" target=\"_blank\" rel=\"noopener\">LLM Leaderboard\u003C\u002Fa> tracks 309 canonical models and updates pricing and performance data on an hourly cadence. It mixes public \u003Ca href=\"\u002Ftag\u002Fbenchmark\">benchmark\u003C\u002Fa> results with live API measurements, which makes it more useful than a static “best model” chart.\u003C\u002Fp>\u003Cp>That matters because the top model for coding is not always the best pick for reasoning, cost, or latency. On this board, the leaders split across several metrics: \u003Ca href=\"https:\u002F\u002Fllm-stats.com\u002Fleaderboards\u002Fllm-leaderboard\" target=\"_blank\" rel=\"noopener\">Claude Opus 4.6\u003C\u002Fa> leads coding arena performance, \u003Ca href=\"https:\u002F\u002Fllm-stats.com\u002Fleaderboards\u002Fllm-leaderboard\" target=\"_blank\" rel=\"noopener\">Claude Mythos Preview\u003C\u002Fa> tops GPQA Diamond, and \u003Ca href=\"https:\u002F\u002Fllm-stats.com\u002Fleaderboards\u002Fllm-leaderboard\" target=\"_blank\" rel=\"noopener\">Gemini 3 Pro\u003C\u002Fa> posts a perfect AIME 2025 score.\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>Metric\u003C\u002Fth>\u003Cth>Leader\u003C\u002Fth>\u003Cth>Value\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>Models tracked\u003C\u002Ftd>\u003Ctd>LLM Stats leaderboard\u003C\u002Ftd>\u003Ctd>309\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Best for coding\u003C\u002Ftd>\u003Ctd>Claude Opus 4.6\u003C\u002Ftd>\u003Ctd>21.3 arena score\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Best on GPQA Diamond\u003C\u002Ftd>\u003Ctd>Claude Mythos Preview\u003C\u002Ftd>\u003Ctd>94.6%\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Best on AIME 2025\u003C\u002Ftd>\u003Ctd>Gemini 3 Pro\u003C\u002Ftd>\u003Ctd>100.0%\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Highest throughput\u003C\u002Ftd>\u003Ctd>Mercury 2\u003C\u002Ftd>\u003Ctd>925 tok\u002Fs\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Largest context window\u003C\u002Ftd>\u003Ctd>Grok 4 Fast\u003C\u002Ftd>\u003Ctd>2.0M tokens\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>What LLM Stats is actually ranking\u003C\u002Fh2>\u003Cp>\u003Ca href=\"https:\u002F\u002Fllm-stats.com\" target=\"_blank\" rel=\"noopener\">LLM Stats\u003C\u002Fa> is trying to answer a very practical question: which model should you pay for today? Instead of relying on a single benchmark, it combines intelligence signals, output speed, latency, and \u003Ca href=\"\u002Ftag\u002Ftoken\">token\u003C\u002Fa> pricing into one score.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780776189065-qk79.png\" alt=\"LLM Leaderboard 2026: 300+ Models Ranked\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>The leaderboard page also exposes the raw ingredients behind that score. You can sort by organization, parameters, hardware, context window, license, modality, price, country, and speed. That is useful because model choice now depends on deployment constraints as much as benchmark bragging rights.\u003C\u002Fp>\u003Cul>\u003Cli>309 canonical models are tracked\u003C\u002Fli>\u003Cli>Pricing is pulled from public API price lists and checked against billing samples\u003C\u002Fli>\u003Cli>Live performance uses a 7-day rolling average\u003C\u002Fli>\u003Cli>Metadata and pricing revalidate every hour\u003C\u002Fli>\u003C\u002Ful>\u003Cp>The design is opinionated in a good way. It does not pretend that every model should be judged the same way, and it does not hide the trade-offs between a pricey frontier model and a cheaper open model that can still ship real work.\u003C\u002Fp>\u003Ch2>The numbers tell a more useful story than a single rank\u003C\u002Fh2>\u003Cp>The top rows make the point clearly. \u003Ca href=\"https:\u002F\u002Fwww.anthropic.com\" target=\"_blank\" rel=\"noopener\">Anthropic\u003C\u002Fa>'s \u003Ca href=\"https:\u002F\u002Fwww.anthropic.com\u002Fnews\u002Fclaude-opus-4\" target=\"_blank\" rel=\"noopener\">Claude Opus 4.6\u003C\u002Fa> shows 39 c\u002Fs speed, 1M context, and $5.00 per million input tokens with $25.00 per million output tokens. \u003Ca href=\"https:\u002F\u002Fopenai.com\" target=\"_blank\" rel=\"noopener\">OpenAI\u003C\u002Fa>'s \u003Ca href=\"https:\u002F\u002Fopenai.com\u002Findex\u002Fgpt-5\u002F\" target=\"_blank\" rel=\"noopener\">GPT-5.5\u003C\u002Fa> pushes 150 c\u002Fs, but costs the same $5.00\u002F$30.00 split at the top end of the table.\u003C\u002Fp>\u003Cp>Then the cheaper options start making a case for themselves. \u003Ca href=\"https:\u002F\u002Fai.google.dev\u002Fgemini-api\" target=\"_blank\" rel=\"noopener\">Google\u003C\u002Fa>'s \u003Ca href=\"https:\u002F\u002Fai.google.dev\u002Fgemini-api\u002Fdocs\" target=\"_blank\" rel=\"noopener\">Gemini 3 Flash\u003C\u002Fa> sits at $0.50 input and $3.00 output per million tokens while still hitting 247 c\u002Fs. \u003Ca href=\"https:\u002F\u002Fqwenlm.github.io\" target=\"_blank\" rel=\"noopener\">Qwen\u003C\u002Fa>'s \u003Ca href=\"https:\u002F\u002Fqwenlm.github.io\u002Fblog\u002Fqwen3\u002F\" target=\"_blank\" rel=\"noopener\">Qwen3.7 Max\u003C\u002Fa> comes in at $1.25 and $3.75, with 120 c\u002Fs and a 1M context window.\u003C\u002Fp>\u003Cul>\u003Cli>\u003Ca href=\"https:\u002F\u002Fwww.anthropic.com\u002Fnews\u002Fclaude-opus-4\" target=\"_blank\" rel=\"noopener\">Claude Opus 4.6\u003C\u002Fa>: 2,132 score, 39 c\u002Fs, 1M context\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fopenai.com\u002Findex\u002Fgpt-5\u002F\" target=\"_blank\" rel=\"noopener\">GPT-5.5\u003C\u002Fa>: 2,105 score, 150 c\u002Fs, 1.1M context\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fai.google.dev\u002Fgemini-api\" target=\"_blank\" rel=\"noopener\">Gemini 3.1 Pro\u003C\u002Fa>: 2,101 score, 164 c\u002Fs, $2.50 input and $15.00 output\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fqwenlm.github.io\u002Fblog\u002Fqwen3\u002F\" target=\"_blank\" rel=\"noopener\">Qwen3.7 Max\u003C\u002Fa>: 1,634 score, 120 c\u002Fs, 1M context\u003C\u002Fli>\u003C\u002Ful>\u003Cp>If you build with models all day, the pattern is familiar: the fastest model is often not the cheapest, and the best benchmark score rarely arrives with friendly pricing. LLM Stats makes that tension visible without forcing you to guess from vendor marketing pages.\u003C\u002Fp>\u003Ch2>Why the methodology matters\u003C\u002Fh2>\u003Cp>The site says model order is based on coding-arena score when available, then GPQA Diamond. That choice is telling. Coding arenas are a strong proxy for practical agentic work, while GPQA Diamond catches models that can handle hard knowledge and reasoning tasks without sounding confident and wrong.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780776186961-bixx.png\" alt=\"LLM Leaderboard 2026: 300+ Models Ranked\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>LLM Stats also says it measures output throughput and time-to-first-token through standardized prompts routed across major API providers, averaged over a 7-day rolling window. That is a better fit for real usage than a one-off launch benchmark, especially when latency can change with provider load, routing, and model updates.\u003C\u002Fp>\u003Cblockquote>“The coding arena is the most discriminating signal at the frontier,” the LLM Stats FAQ says.\u003C\u002Fblockquote>\u003Cp>That line is the clearest statement of the product’s philosophy. Instead of chasing one universal number, the leaderboard tries to separate the kinds of intelligence that matter in production: code generation, deep reasoning, long-context work, and tool use.\u003C\u002Fp>\u003Cp>It also explains why the leaderboard can feel different from the usual social-media ranking posts. A model that wins one benchmark may still lose on price, latency, or context length, and the page keeps those trade-offs in view.\u003C\u002Fp>\u003Ch2>How the leaderboard compares on the metrics that matter\u003C\u002Fh2>\u003Cp>The comparison view is where the site becomes genuinely useful. You can put models side by side and inspect code arena, reasoning, math, coding, search, writing, vision, tools, and long-context performance in one place.\u003C\u002Fp>\u003Cp>That makes it easier to answer questions like: should I pay for a premium closed model, use a cheaper fast model, or pick an open model that is good enough for the task? The answer changes depending on whether you care about throughput, token cost, or a benchmark like \u003Ca href=\"\u002Ftag\u002Fswe-bench-verified\">SWE-bench Verified\u003C\u002Fa>.\u003C\u002Fp>\u003Cul>\u003Cli>\u003Ca href=\"https:\u002F\u002Fllm-stats.com\u002Fleaderboards\u002Fllm-leaderboard\" target=\"_blank\" rel=\"noopener\">Claude Opus 4.6\u003C\u002Fa> leads coding arena with 21.3\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fllm-stats.com\u002Fleaderboards\u002Fllm-leaderboard\" target=\"_blank\" rel=\"noopener\">Mercury 2\u003C\u002Fa> reaches 925 tok\u002Fs, the highest throughput listed\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fllm-stats.com\u002Fleaderboards\u002Fllm-leaderboard\" target=\"_blank\" rel=\"noopener\">Nemotron 3 Nano\u003C\u002Fa> costs $0.06 per 1M input tokens, the cheapest input price shown\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fllm-stats.com\u002Fleaderboards\u002Fllm-leaderboard\" target=\"_blank\" rel=\"noopener\">Grok 4 Fast\u003C\u002Fa> offers a 2.0M token context window, the largest listed\u003C\u002Fli>\u003C\u002Ful>\u003Cp>Those four numbers tell a better story than a single “best model” badge. If you are building agents, the fastest model may matter more than the highest score. If you are doing long-document analysis, context window can beat raw benchmark rank. If you are serving users at scale, token price can be the deciding factor.\u003C\u002Fp>\u003Cp>For OraCore readers, the practical takeaway is simple: use the leaderboard as a shortlist tool, not an oracle. Start with the metric that matches your product, then compare the top few rows instead of chasing the overall rank.\u003C\u002Fp>\u003Ch2>What changes next for model selection\u003C\u002Fh2>\u003Cp>LLM release cycles are moving fast enough that a static “best model” post goes stale almost immediately. A continuously updated board like this is more useful because it reflects live pricing, fresh benchmark submissions, and provider-side performance changes.\u003C\u002Fp>\u003Cp>My read: the next phase of model selection will look less like picking one winner and more like choosing among specialized leaders. Coding, reasoning, cheap \u003Ca href=\"\u002Ftag\u002Finference\">inference\u003C\u002Fa>, \u003Ca href=\"\u002Fnews\u002Fwhy-minimax-m3-matters-long-context-model-en\">long context\u003C\u002Fa>, and tool use are already splitting apart, and this leaderboard makes that split obvious.\u003C\u002Fp>\u003Cp>If you are shipping with \u003Ca href=\"\u002Ftag\u002Fllms\">LLMs\u003C\u002Fa> now, the smart move is to keep a shortlist of two premium models and one budget option, then re-check the numbers whenever your workload changes. The leaderboard already gives you the inputs; the real question is which metric your product should care about first.\u003C\u002Fp>","LLM Stats now ranks 309 models by score, speed, and price, with hourly updates from benchmarks and live API measurements.","llm-stats.com","https:\u002F\u002Fllm-stats.com\u002Fleaderboards\u002Fllm-leaderboard",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780776189065-qk79.png","tools","en","34162763-ffe3-416d-a719-e450ba87ac3d",[17,18,19,20,21],"LLM leaderboard","AI model comparison","benchmark scores","token pricing","model latency",[23,24,25],"LLM Stats tracks 309 models and updates pricing plus performance hourly.","The best model depends on the task: coding, reasoning, speed, or cost.","The leaderboard is more useful as a shortlist tool than a single winner chart.",0,"2026-06-06T20:02:37.334702+00:00","2026-06-06T20:02:37.328+00:00","b5414f5f-a1d9-47e4-b0fc-71370a82dadc",{"tags":31,"relatedLang":42,"relatedPosts":46},[32,34,36,38,40],{"name":21,"slug":33},"model-latency",{"name":17,"slug":35},"llm-leaderboard",{"name":19,"slug":37},"benchmark-scores",{"name":20,"slug":39},"token-pricing",{"name":18,"slug":41},"ai-model-comparison",{"id":15,"slug":43,"title":44,"language":45},"llm-leaderboard-2026-300-models-ranked-zh","2026 LLM 排行榜：309 模型怎麼選","zh",[47,53,59,65,71,77],{"id":48,"slug":49,"title":50,"cover_image":51,"image_url":51,"created_at":52,"category":13},"92a22a3d-6d0c-4884-9865-c1fe0f2e5e78","llama-benchy-llama-bench-style-api-benchmarks-en","llama-benchy brings llama-bench tests to APIs","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780775297695-nchl.png","2026-06-06T19:47:54.675055+00:00",{"id":54,"slug":55,"title":56,"cover_image":57,"image_url":57,"created_at":58,"category":13},"df69beef-d6a6-40d1-9284-474eebad74b7","how-to-start-vibe-coding-with-ai-en","How to Start Vibe Coding with AI","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780773471455-gav1.png","2026-06-06T19:17:22.823911+00:00",{"id":60,"slug":61,"title":62,"cover_image":63,"image_url":63,"created_at":64,"category":13},"bece181a-96c8-494b-ac0b-fb254413e051","nvidia-ai-models-playbook-en","NVIDIA AI Models turn model hunting into a playbook","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780771718084-8xiy.png","2026-06-06T18:48:07.10885+00:00",{"id":66,"slug":67,"title":68,"cover_image":69,"image_url":69,"created_at":70,"category":13},"40bf1841-77d9-4bf6-9764-3e956510d41a","kimi-k25-claude-code-cline-roocode-setup-en","Kimi K2.5 works in Claude Code and Cline","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780769031200-z2kv.png","2026-06-06T18:03:19.685945+00:00",{"id":72,"slug":73,"title":74,"cover_image":75,"image_url":75,"created_at":76,"category":13},"258a698f-2ab5-47bf-9b3b-ec8a8e14b8be","why-small-businesses-should-use-ai-for-admin-en","Why small businesses should use AI for admin, not everything","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780758184970-z888.png","2026-06-06T15:02:18.347592+00:00",{"id":78,"slug":79,"title":80,"cover_image":81,"image_url":81,"created_at":82,"category":13},"7da5424f-1ff8-483a-80ed-7091c5b0454b","crun-ai-gemini-omni-chat-video-editing-en","Crun AI turns Gemini Omni into chat video editing","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780733910991-ji5m.png","2026-06-06T08:18:00.680201+00:00",[84,89,94,99,104,109,114,119,124,129],{"id":85,"slug":86,"title":87,"created_at":88},"8008f1a9-7a00-4bad-88c9-3eedc9c6b4b1","surepath-ai-mcp-policy-controls-en","SurePath AI's New MCP Policy Controls Enhance AI Security","2026-03-26T01:26:52.222015+00:00",{"id":90,"slug":91,"title":92,"created_at":93},"27e39a8f-b65d-4f7b-a875-859e2b210156","mcp-standard-ai-tools-2026-en","MCP Standard in 2026: Integrating AI Tools","2026-03-26T01:27:43.127519+00:00",{"id":95,"slug":96,"title":97,"created_at":98},"165f9a19-c92d-46ba-b3f0-7125f662921d","rag-2026-transforming-enterprise-ai-en","How RAG in 2026 is Transforming Enterprise AI","2026-03-26T01:28:11.485236+00:00",{"id":100,"slug":101,"title":102,"created_at":103},"6a2a8e6e-b956-49d8-be12-cc47bdc132b2","mastering-ai-prompts-2026-guide-en","Mastering AI Prompts: A 2026 Guide for Developers","2026-03-26T01:29:07.835148+00:00",{"id":105,"slug":106,"title":107,"created_at":108},"3ab2c67e-4664-4c67-a013-687a2f605814","garry-tan-open-sources-claude-code-toolkit-en","Garry Tan Open-Sources a Claude Code Toolkit","2026-03-26T08:26:20.245934+00:00",{"id":110,"slug":111,"title":112,"created_at":113},"66a7cbf8-7e76-41d4-9bbf-eaca9761bf69","github-ai-projects-to-watch-in-2026-en","20 GitHub AI Projects to Watch in 2026","2026-03-26T08:28:09.752027+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"9f332fda-eace-448a-a292-2283951eee71","practical-github-guide-learning-ml-2026-en","A Practical GitHub Guide to Learning ML in 2026","2026-03-27T01:16:50.125678+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"1b1f637d-0f4d-42bd-974b-07b53829144d","aiml-2026-student-ai-ml-lab-repo-review-en","AIML-2026 Is a Bare-Bones Student Lab Repo","2026-03-27T01:21:51.661231+00:00",{"id":125,"slug":126,"title":127,"created_at":128},"6d1bf3f6-e191-4d30-b55b-8a0722fa6afe","ai-trending-github-repos-and-research-feeds-en","AI Trending Tracks Repos and Research Feeds","2026-03-27T01:31:35.709532+00:00",{"id":130,"slug":131,"title":132,"created_at":133},"010539a1-4c3a-4bd3-937a-26616422ee0d","awesome-ai-for-science-research-tools-map-en","Awesome AI for Science Is Becoming a Real Research Map","2026-03-27T01:46:50.89513+00:00"]