[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-stanford-2026-ai-index-charts-explained-en":3,"tags-stanford-2026-ai-index-charts-explained-en":30,"related-lang-stanford-2026-ai-index-charts-explained-en":31,"related-posts-stanford-2026-ai-index-charts-explained-en":35,"series-research-ca152f29-641a-4c5b-8ca6-47a9a95b5d77":54},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":19,"translated_content":10,"views":20,"is_premium":21,"created_at":22,"updated_at":22,"cover_image":11,"published_at":23,"rewrite_status":24,"rewrite_error":10,"rewritten_from_id":25,"slug":26,"category":27,"related_article_id":28,"status":29,"google_indexed_at":10,"x_posted_at":10},"ca152f29-641a-4c5b-8ca6-47a9a95b5d77","Stanford’s 2026 AI Index, explained with charts","\u003Cp>If you want a clean read on AI in 2026, start with one number: AI \u003Ca href=\"\u002Fnews\u002Fspans-mini-ai-data-centers-move-into-homes-en\">data centers\u003C\u002Fa> can now draw 29.6 gigawatts of power, about the peak demand of New York state. Another number matters just as much: more than half of people worldwide now use AI within three years of it going mainstream. That pace is hard to ignore, even if the industry still feels chaotic.\u003C\u002Fp>\u003Cp>The new \u003Ca href=\"https:\u002F\u002Faiindex.stanford.edu\u002Freport\u002F\" target=\"_blank\" rel=\"noopener\">Stanford AI Index\u003C\u002Fa> cuts through the daily noise with charts on model quality, adoption, jobs, regulation, and infrastructure. It does not read like a victory lap. It reads like a report from a technology that is improving quickly, spending aggressively, and outpacing the systems meant to measure it.\u003C\u002Fp>\u003Cp>That tension is the story. The best models keep getting better, but the cost of training and running them keeps climbing too. Meanwhile, the public, policymakers, and even benchmark designers are trying to catch up with a field that changes faster than most of its own scorecards.\u003C\u002Fp>\u003Ch2>AI is improving fast, but the gains are uneven\u003C\u002Fh2>\u003Cp>One of the clearest takeaways from the 2026 AI Index is that top-tier models are still advancing at a speed that would have sounded implausible a few years ago. On SWE-bench Verified, a benchmark for software engineering, top scores jumped from around 60% in 2024 to almost 100% in 2025. That is the kind of jump that changes how teams think about coding assistants, bug fixing, and agentic workflows.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776427445810-u5bp.png\" alt=\"Stanford’s 2026 AI Index, explained with charts\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>But the same report also makes a point that gets lost in headline-chasing: model intelligence is jagged. A system can ace coding tasks and still fail at basic physical reasoning, household chores, or tasks that require long-term interaction with the real world. That matters because a model that writes good code is useful; a model that can reason reliably across domains is something else entirely.\u003C\u002Fp>\u003Cp>The report also notes that some AI systems are now meeting or exceeding human expert performance on tests aimed at PhD-level science, math, and language understanding. That sounds huge, and it is. Still, benchmark gains do not always translate into dependable real-world behavior, especially when the task is messy, open-ended, or dependent on context.\u003C\u002Fp>\u003Cul>\u003Cli>SWE-bench Verified top scores rose from about 60% in 2024 to nearly 100% in 2025.\u003C\u002Fli>\u003Cli>Robots succeed in only 12% of household tasks, which is a reminder that physical-world AI lags far behind text models.\u003C\u002Fli>\u003Cli>Waymo now operates across five US cities, while Baidu’s Apollo Go runs rider services in China.\u003C\u002Fli>\u003Cli>An AI system generated its own weather forecast in 2025, showing how quickly automation is moving into specialized work.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>The US and China are nearly tied, but in different ways\u003C\u002Fh2>\u003Cp>The geopolitical race is tighter than the headline chatter suggests. According to the report’s use of \u003Ca href=\"https:\u002F\u002Flmarena.ai\u002F\" target=\"_blank\" rel=\"noopener\">Arena\u003C\u002Fa>, the community ranking platform formerly known as LMSYS Chatbot Arena, the US and China are nearly neck and neck on model performance. In early 2023, \u003Ca href=\"https:\u002F\u002Fopenai.com\u002F\" target=\"_blank\" rel=\"noopener\">OpenAI\u003C\u002Fa> had a clear lead with ChatGPT. By 2024, that gap narrowed as \u003Ca href=\"https:\u002F\u002Fdeepmind.google\u002F\" target=\"_blank\" rel=\"noopener\">Google\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Fwww.anthropic.com\u002F\" target=\"_blank\" rel=\"noopener\">Anthropic\u003C\u002Fa> shipped stronger models. In February 2025, DeepSeek’s R1 briefly matched the top US model.\u003C\u002Fp>\u003Cp>As of March 2026, Anthropic leads the rankings, with xAI, Google, and OpenAI close behind. Chinese models from DeepSeek and Alibaba trail only modestly. That means the competition is no longer about a single breakout model. It is about cost, reliability, and how much value each company can squeeze out of every inference.\u003C\u002Fp>\u003Cp>The deeper split is in infrastructure and research output. The US has an estimated 5,427 data centers, more than 10 times as many as any other country, plus more capital and more of the leading frontier labs. China leads in AI research publications, patents, and robotics. If you want the short version: the US has more model muscle, China has more industrial breadth.\u003C\u002Fp>\u003Cblockquote>“I am stunned that this technology continues to improve, and it’s just not plateauing in any way,” said \u003Ca href=\"https:\u002F\u002Fhcii.stanford.edu\u002Fpeople\u002Fyolanda-gil\" target=\"_blank\" rel=\"noopener\">Yolanda Gil\u003C\u002Fa>, a computer scientist at the University of Southern California and coauthor of the report.\u003C\u002Fblockquote>\u003Cp>Gil’s point matters because it pushes back on a popular assumption that frontier AI was about to hit a ceiling. The report says the opposite. The ceiling keeps moving, and the industry keeps finding new ways to climb.\u003C\u002Fp>\u003Cul>\u003Cli>US AI strength: more capital, more leading models, and about 5,427 data centers.\u003C\u002Fli>\u003Cli>China’s strengths: more publications, more patents, and stronger robotics output.\u003C\u002Fli>\u003Cli>As of March 2026, Anthropic leads the Arena rankings, with xAI, Google, and OpenAI close behind.\u003C\u002Fli>\u003Cli>DeepSeek’s R1 briefly matched the top US model in February 2025.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>The benchmarks are lagging the products they measure\u003C\u002Fh2>\u003Cp>If AI progress feels hard to pin down, the measurement problem is part of the reason. The Stanford report says benchmarks are struggling because models blow past their ceilings quickly. Some tests are badly designed. One popular math benchmark has a 42% error rate. Others get gamed when model builders train directly on benchmark data.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776427444163-dz1j.png\" alt=\"Stanford’s 2026 AI Index, explained with charts\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>That means a flashy score can hide a lot. A model may look excellent on a leaderboard and still disappoint in practice, especially when it has to hold a conversation, use tools, or handle a task that changes halfway through. For agents and robots, the problem is even worse because the tests barely exist in mature form.\u003C\u002Fp>\u003Cp>There is also a transparency problem. As competition heats up, companies like OpenAI, Anthropic, and Google have stopped disclosing training code, parameter counts, and dataset sizes. That makes independent evaluation harder and safety research messier. If you cannot inspect the ingredients, you cannot fully judge the meal.\u003C\u002Fp>\u003Cp>This is why benchmark charts now need a healthy dose of skepticism. They still tell us something useful, but they no longer tell the whole story. The report’s message is basically: read the scores, then ask what the scores miss.\u003C\u002Fp>\u003Cul>\u003Cli>One math benchmark cited in the report has a 42% error rate.\u003C\u002Fli>\u003Cli>Training on benchmark data can inflate scores without improving real capability.\u003C\u002Fli>\u003Cli>For AI agents and robots, standardized tests are still thin or missing.\u003C\u002Fli>\u003Cli>Major labs are sharing less about training code, parameter counts, and dataset sizes.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>AI is already changing work, but the effects are uneven\u003C\u002Fh2>\u003Cp>AI adoption is moving faster than the personal computer or the internet did. The report says more than half of people worldwide now use AI within three years of mainstream adoption. It also says 88% of organizations use AI, and four in five university students do too. That is a staggering spread for a technology that most people still describe with uncertainty.\u003C\u002Fp>\u003Cp>The labor impact is harder to measure, but the early signs are showing up in specific places. A 2025 Stanford economics study found employment for software developers ages 22 to 25 fell nearly 20% since 2022. That does not prove AI is the only cause. Macro conditions matter too. Still, the timing is uncomfortable for anyone hoping entry-level knowledge work would stay insulated.\u003C\u002Fp>\u003Cp>Productivity gains are real, at least in some settings. Research cited by the index says AI boosts productivity by 14% in customer service and 26% in software development. Those gains are less visible in work that requires judgment, context, or accountability. In other words, AI is easiest to deploy where the task has structure and the output can be checked quickly.\u003C\u002Fp>\u003Cp>That pattern lines up with what companies are saying about headcount. A 2025 McKinsey survey found a third of organizations expect AI to shrink their workforce in the coming year, especially in service, supply chain, and software roles. That is not universal, but it is enough to change hiring plans.\u003C\u002Fp>\u003Cul>\u003Cli>More than half of people worldwide use AI within three years of mainstream adoption.\u003C\u002Fli>\u003Cli>88% of organizations and four in five university students use AI.\u003C\u002Fli>\u003Cli>Employment for software developers ages 22 to 25 fell nearly 20% since 2022.\u003C\u002Fli>\u003Cli>AI productivity gains: 14% in customer service and 26% in software development.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Governments are reacting, but the pace is off\u003C\u002Fh2>\u003Cp>People are split on AI, and the split is sharper than the usual optimism-versus-fear framing suggests. An Ipsos survey cited in the index says 59% of people think AI will bring more benefits than drawbacks, while 52% say it makes them nervous. Those numbers can both be true, which is probably the most human response possible to a fast-moving technology.\u003C\u002Fp>\u003Cp>The trust gap is even more revealing. A Pew survey cited in the report found that 73% of experts think AI will have a positive impact on how people do their jobs, while only 23% of the American public agrees. The biggest divide is work, but education and medical care show similar gaps. On elections and personal relationships, both groups are more cautious.\u003C\u002Fp>\u003Cp>Regulation is moving, but unevenly. The \u003Ca href=\"https:\u002F\u002Fartificialintelligenceact.eu\u002F\" target=\"_blank\" rel=\"noopener\">EU AI Act\u003C\u002Fa> started enforcing its first prohibitions, including bans on predictive policing and emotion recognition. Japan, South Korea, and Italy passed national AI laws. In the US, President Trump issued an executive order aimed at limiting state AI regulation, even as state legislatures passed a record 150 AI-related bills.\u003C\u002Fp>\u003Cp>California’s SB 53 and New York’s RAISE Act are the most interesting examples here because they point toward disclosure, incident reporting, and whistleblower protections. That is a very different approach from simply telling companies to move faster and sort it out later.\u003C\u002Fp>\u003Cp>What the report makes plain is that governance is still reactive. The technology is advancing on one clock, and oversight is running on another. Until those clocks get closer, every new model release will keep forcing the same uncomfortable question: who gets to decide what “safe enough” means?\u003C\u002Fp>\u003Ch2>The chart to watch next is the one about power\u003C\u002Fh2>\u003Cp>The 2026 AI Index is useful because it refuses to reduce AI to one storyline. The models are improving. The infrastructure bill is enormous. The benchmarks are shaky. The job effects are real but uneven. The policy response is active but slow. That mix is why the charts matter more than the slogans.\u003C\u002Fp>\u003Cp>If I had to make one prediction, it is this: the next year of AI debate will be less about whether models can do impressive things and more about whether companies can afford to keep scaling them. Power, water, chips, and regulation will shape the story as much as model quality. If you want to understand the next phase, watch the infrastructure numbers as closely as the benchmark scores.\u003C\u002Fp>\u003Cp>And if you are building with AI right now, the practical takeaway is simple. Do not trust a leaderboard alone. Test models on your own tasks, measure failure modes, and assume the biggest constraint may come from cost or deployment, not raw capability.\u003C\u002Fp>\u003Cp>For a related read on evaluation problems, see our coverage of \u003Ca href=\"\u002Fnews\u002Fai-benchmarks-are-broken\" target=\"_blank\" rel=\"noopener\">why AI benchmarks are breaking down\u003C\u002Fa>. That story connects directly to what the Stanford report is warning us about: the scorecard is getting harder to trust just as the stakes get higher.\u003C\u002Fp>","Stanford’s 2026 AI Index shows faster adoption, rising costs, and thin US-China gaps. The charts tell a messier story than the hype.","www.technologyreview.com","https:\u002F\u002Fwww.technologyreview.com\u002F2026\u002F04\u002F13\u002F1135675\u002Fwant-to-understand-the-current-state-of-ai-check-out-these-charts\u002F",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776427445810-u5bp.png",[13,14,15,16,17,18],"AI Index","Stanford","AI benchmarks","OpenAI","Anthropic","AI regulation","en",0,false,"2026-04-17T12:03:47.703137+00:00","2026-04-17T12:03:47.526+00:00","done","ddf7b3d3-9388-4635-9d43-ba9d80de2684","stanford-2026-ai-index-charts-explained-en","research","57fe6457-4c90-4c0d-84a2-c062d87421f8","published",[],{"id":28,"slug":32,"title":33,"language":34},"stanford-2026-ai-index-charts-explained-zh","史丹佛 2026 AI Index 圖表解讀","zh",[36,42,48],{"id":37,"slug":38,"title":39,"cover_image":40,"image_url":40,"created_at":41,"category":27},"3a330546-beae-4173-9b71-9d0d446ff432","llm-judge-reliability-conformal-transitivity-en","How to Trust LLM Judges, Per Input","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776406194842-ml7e.png","2026-04-17T06:09:33.415041+00:00",{"id":43,"slug":44,"title":45,"cover_image":46,"image_url":46,"created_at":47,"category":27},"443c85ce-62b3-4336-ad93-7a8a1538d271","llm-generalization-shortest-path-scale-en","Why LLMs Generalize on Maps but Fail on Scale","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776406022431-jsmd.png","2026-04-17T06:06:34.142981+00:00",{"id":49,"slug":50,"title":51,"cover_image":52,"image_url":52,"created_at":53,"category":27},"8f2a857e-5fc6-4e89-9006-50ad35f1a14e","mm-webagent-hierarchical-multimodal-webpages-en","MM-WebAgent Makes Webpage Generation More Coherent","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776405842418-oagf.png","2026-04-17T06:03:36.735796+00:00",[55,60,65,70,75,80,85,90,95,100],{"id":56,"slug":57,"title":58,"created_at":59},"a2715e72-1fe8-41b3-abb1-d0cf1f710189","ai-predictions-2026-big-changes-en","AI Predictions for 2026: Brace for Big Changes","2026-03-26T01:25:07.788356+00:00",{"id":61,"slug":62,"title":63,"created_at":64},"8404bd7b-4c2f-4109-9ec4-baf29d88af2b","ml-papers-of-the-week-github-research-desk-en","ML Papers of the Week Turns GitHub Into a Research Desk","2026-03-27T01:11:39.480259+00:00",{"id":66,"slug":67,"title":68,"created_at":69},"87897a94-8065-4464-a016-1f23e89e17cc","ai-ml-conferences-to-watch-in-2026-en","AI\u002FML Conferences to Watch in 2026","2026-03-27T01:51:54.184108+00:00",{"id":71,"slug":72,"title":73,"created_at":74},"6f1987cf-25f3-47a4-b3e6-db0997695be8","openclaw-agents-manipulated-self-sabotage-en","OpenClaw Agents Can Be Manipulated Into Failure","2026-03-28T03:03:18.899465+00:00",{"id":76,"slug":77,"title":78,"created_at":79},"a53571ad-735a-4178-9f93-cb09b699d99c","vega-driving-language-instructions-en","Vega: Driving with Natural Language Instructions","2026-03-28T14:54:04.698882+00:00",{"id":81,"slug":82,"title":83,"created_at":84},"a34581d6-f36e-46da-88bb-582fb3e7425c","personalizing-autonomous-driving-styles-en","Drive My Way: Personalizing Autonomous Driving Styles","2026-03-28T14:54:26.148181+00:00",{"id":86,"slug":87,"title":88,"created_at":89},"2bc1ad7f-26ce-4f02-9885-803b35fd229d","training-knowledge-bases-writeback-rag-en","Training Knowledge Bases with WriteBack-RAG","2026-03-28T14:54:45.643433+00:00",{"id":91,"slug":92,"title":93,"created_at":94},"71adc507-3c54-4605-bbe2-c966acd6187e","packforcing-long-video-generation-en","PackForcing: Efficient Long-Video Generation Method","2026-03-28T14:55:02.646943+00:00",{"id":96,"slug":97,"title":98,"created_at":99},"675942ef-b9ec-4c5f-a997-381250b6eacb","pixelsmile-facial-expression-editing-en","PixelSmile Framework Enhances Facial Expression Editing","2026-03-28T14:55:20.633463+00:00",{"id":101,"slug":102,"title":103,"created_at":104},"6954fa2b-8b66-4839-884b-e46f89fa1bc3","adaptive-block-scaled-data-types-en","IF4: Smarter 4-Bit Quantization That Adapts to Your Data","2026-03-31T06:00:36.65963+00:00"]