[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-ai-model-benchmarks-gpt-55-claude-gemini-grok-en":3,"article-related-ai-model-benchmarks-gpt-55-claude-gemini-grok-en":31,"series-research-29c4b64b-1ff6-4e8f-a478-a43cc9507809":76},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":23,"views":27,"created_at":28,"published_at":29,"topic_cluster_id":30},"29c4b64b-1ff6-4e8f-a478-a43cc9507809","ai-model-benchmarks-gpt-55-claude-gemini-grok-en","18 AI benchmarks now rank GPT-5.5, Claude, Gemini","\u003Cp data-speakable=\"summary\">LM Council’s \u003Ca href=\"\u002Fnews\u002Fopenai-june-2026-agents-payments-legal-heat-en\">June 2026\u003C\u002Fa> hub compares frontier AI models across 18 independent benchmarks.\u003C\u002Fp>\u003Cp>LM Council updated its \u003Ca href=\"https:\u002F\u002Flmcouncil.ai\u002Fbenchmarks\" target=\"_blank\" rel=\"noopener\">AI Model Benchmarks\u003C\u002Fa> page on June 14, 2026, pulling together 18 tests and 30+ models from sources including \u003Ca href=\"https:\u002F\u002Fepoch.ai\" target=\"_blank\" rel=\"noopener\">Epoch AI\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Fscale.com\" target=\"_blank\" rel=\"noopener\">Scale AI\u003C\u002Fa>. The list tracks names such as GPT-5.5, \u003Ca href=\"\u002Ftag\u002Fclaude\">Claude\u003C\u002Fa> Opus, \u003Ca href=\"\u002Ftag\u002Fgemini\">Gemini\u003C\u002Fa> 3, and Grok 4 across reasoning, coding, math, \u003Ca href=\"\u002Ftag\u002Fagent\">agent\u003C\u002Fa> tasks, and visual benchmarks.\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>項目\u003C\u002Fth>\u003Cth>數值\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>Benchmarks tracked\u003C\u002Ftd>\u003Ctd>18\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Models in comparison set\u003C\u002Ftd>\u003Ctd>30+\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Last updated\u003C\u002Ftd>\u003Ctd>June 14, 2026\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>FrontierMath v2 release date\u003C\u002Ftd>\u003Ctd>June 12, 2026\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>What changed\u003C\u002Fh2>\u003Cp>The page is not a single leaderboard. It is an interactive comparison tool that lets users pick two models, filter results, and inspect scores across curated datasets such as Humanity’s Last Exam, \u003Ca href=\"\u002Ftag\u002Fswe-bench-verified\">SWE-bench Verified\u003C\u002Fa>, GPQA Diamond, FrontierMath, Terminal-Bench 2.0, and GeoBench.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781636573742-hzva.png\" alt=\"18 AI benchmarks now rank GPT-5.5, Claude, Gemini\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Several benchmarks on the page are independently run, which means the numbers may differ from vendor-reported results. LM Council also notes that the \u003Ca href=\"\u002Ftag\u002Fbenchmark\">benchmark\u003C\u002Fa> set is curated by AI Explained, with a focus on widely watched tests rather than model-maker marketing claims.\u003C\u002Fp>\u003Cul>\u003Cli>Humanity’s Last Exam: Gemini 3.1 Pro Preview leads at 46.4% ±2.0.\u003C\u002Fli>\u003Cli>SWE-bench Verified: Claude Opus 4.7 (max) tops the chart at 83.5% ±1.7.\u003C\u002Fli>\u003Cli>GPQA Diamond: GPT-5.4 Pro (xhigh) leads at 94.6% ±1.6.\u003C\u002Fli>\u003Cli>FrontierMath Tiers 1-3 v2: GPT-5.5 Pro (xhigh) scores 87.7% ±1.9.\u003C\u002Fli>\u003C\u002Ful>\u003Cp>Other results show a split by task type. Claude models do well on coding and terminal work, GPT-5 variants lead several math and knowledge tests, and Gemini 3 models post strong scores in visual physics and geography tasks.\u003C\u002Fp>\u003Ch2>Why it matters\u003C\u002Fh2>\u003Cp>For developers, the page offers a quick way to compare model fit by workload instead of relying on one headline score. A model that wins on math may lag on code fixes, while another may do better on long-context or terminal-based agent tasks.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781636586358-0v5g.png\" alt=\"18 AI benchmarks now rank GPT-5.5, Claude, Gemini\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>For buyers and teams watching the market, the hub makes it easier to spot where frontier models are separating from one another and where the gaps are small enough that price, latency, or tool support may matter more than raw benchmark rank.\u003C\u002Fp>\u003Cp>The main takeaway is that June 2026 model selection looks less like choosing a single “best” model and more like matching the benchmark to the job.\u003C\u002Fp>","LM Council’s June 2026 benchmark hub compares 30+ models across 18 tests, with fresh scores from Epoch AI, Scale AI, and others.","lmcouncil.ai","https:\u002F\u002Flmcouncil.ai\u002Fbenchmarks",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781636573742-hzva.png","research","en","83b83aaf-90bf-44d6-a2c8-74665bfe99b8",[17,18,19,20,21,22],"AI benchmarks","GPT-5.5","Claude Opus","Gemini 3","Epoch AI","Scale AI",[24,25,26],"LM Council updated a 18-benchmark comparison hub on June 14, 2026.","The page compares 30+ models, including GPT-5.5, Claude Opus, Gemini 3, and Grok 4.","Independent benchmarks show different leaders by task, from math to coding to visual reasoning.",0,"2026-06-16T19:02:23.681596+00:00","2026-06-16T19:02:23.675+00:00","3a949a81-75cc-4a29-a9ce-24903ce51366",{"tags":32,"relatedLang":35,"relatedPosts":39},[33],{"name":17,"slug":34},"ai-benchmarks",{"id":15,"slug":36,"title":37,"language":38},"ai-model-benchmarks-gpt-55-claude-gemini-en-zh","18 項 AI 基準更新：GPT-5.5、Claude、Gemini 同場比拼","zh",[40,46,52,58,64,70],{"id":41,"slug":42,"title":43,"cover_image":44,"image_url":44,"created_at":45,"category":13},"d1c56a9f-a495-46df-b7f7-3a6036031e56","phase-noise-information-aging-massive-mimo-en","Phase noise makes massive MIMO information age","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781641074734-76ux.png","2026-06-16T20:17:28.34729+00:00",{"id":47,"slug":48,"title":49,"cover_image":50,"image_url":50,"created_at":51,"category":13},"99c24ad4-5a05-4bd8-a1fc-1c9676530a3a","exact-posterior-scores-inverse-problems-en","Exact posterior scores for inverse problems","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781591573015-t209.png","2026-06-16T06:32:32.175258+00:00",{"id":53,"slug":54,"title":55,"cover_image":56,"image_url":56,"created_at":57,"category":13},"79767774-adbe-4e97-93d9-9c5bf674b35e","contextrl-teaches-llms-to-pick-right-evidence-en","ContextRL teaches LLMs to pick the right evidence","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781590673379-8nq0.png","2026-06-16T06:17:30.366185+00:00",{"id":59,"slug":60,"title":61,"cover_image":62,"image_url":62,"created_at":63,"category":13},"01f05d3f-fb22-4194-b211-bfe8e02bd544","language-models-value-axis-en","Language models have a “value axis”","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781589776527-cruc.png","2026-06-16T06:02:35.947355+00:00",{"id":65,"slug":66,"title":67,"cover_image":68,"image_url":68,"created_at":69,"category":13},"1770f0e4-4b10-459d-bb9b-be13075b1a3d","persona-pruner-lightweight-role-playing-models-en","Persona-Pruner trims models for role-playing","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781505171903-58bv.png","2026-06-15T06:32:25.55966+00:00",{"id":71,"slug":72,"title":73,"cover_image":74,"image_url":74,"created_at":75,"category":13},"2a85882b-ba8c-44c8-809e-e19691776f37","clinhallu-medical-mllm-hallucination-benchmark-en","ClinHallu maps where medical MLLMs hallucinate","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781504273229-o70v.png","2026-06-15T06:17:23.262119+00:00",[77,82,87,92,97,102,107,112,117,122],{"id":78,"slug":79,"title":80,"created_at":81},"a2715e72-1fe8-41b3-abb1-d0cf1f710189","ai-predictions-2026-big-changes-en","AI Predictions for 2026: Brace for Big Changes","2026-03-26T01:25:07.788356+00:00",{"id":83,"slug":84,"title":85,"created_at":86},"8404bd7b-4c2f-4109-9ec4-baf29d88af2b","ml-papers-of-the-week-github-research-desk-en","ML Papers of the Week Turns GitHub Into a Research Desk","2026-03-27T01:11:39.480259+00:00",{"id":88,"slug":89,"title":90,"created_at":91},"87897a94-8065-4464-a016-1f23e89e17cc","ai-ml-conferences-to-watch-in-2026-en","AI\u002FML Conferences to Watch in 2026","2026-03-27T01:51:54.184108+00:00",{"id":93,"slug":94,"title":95,"created_at":96},"6f1987cf-25f3-47a4-b3e6-db0997695be8","openclaw-agents-manipulated-self-sabotage-en","OpenClaw Agents Can Be Manipulated Into Failure","2026-03-28T03:03:18.899465+00:00",{"id":98,"slug":99,"title":100,"created_at":101},"a53571ad-735a-4178-9f93-cb09b699d99c","vega-driving-language-instructions-en","Vega: Driving with Natural Language Instructions","2026-03-28T14:54:04.698882+00:00",{"id":103,"slug":104,"title":105,"created_at":106},"a34581d6-f36e-46da-88bb-582fb3e7425c","personalizing-autonomous-driving-styles-en","Drive My Way: Personalizing Autonomous Driving Styles","2026-03-28T14:54:26.148181+00:00",{"id":108,"slug":109,"title":110,"created_at":111},"2bc1ad7f-26ce-4f02-9885-803b35fd229d","training-knowledge-bases-writeback-rag-en","Training Knowledge Bases with WriteBack-RAG","2026-03-28T14:54:45.643433+00:00",{"id":113,"slug":114,"title":115,"created_at":116},"71adc507-3c54-4605-bbe2-c966acd6187e","packforcing-long-video-generation-en","PackForcing: Efficient Long-Video Generation Method","2026-03-28T14:55:02.646943+00:00",{"id":118,"slug":119,"title":120,"created_at":121},"675942ef-b9ec-4c5f-a997-381250b6eacb","pixelsmile-facial-expression-editing-en","PixelSmile Framework Enhances Facial Expression Editing","2026-03-28T14:55:20.633463+00:00",{"id":123,"slug":124,"title":125,"created_at":126},"6954fa2b-8b66-4839-884b-e46f89fa1bc3","adaptive-block-scaled-data-types-en","IF4: Smarter 4-Bit Quantization That Adapts to Your Data","2026-03-31T06:00:36.65963+00:00"]