[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-grok-build-goal-autonomous-coding-en":3,"article-related-grok-build-goal-autonomous-coding-en":30,"series-ai-agent-0003f204-e4d0-4015-8208-bbd23ecfb908":77},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":22,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":29},"0003f204-e4d0-4015-8208-bbd23ecfb908","grok-build-goal-autonomous-coding-en","Grok Build Adds \u002Fgoal for Autonomous Coding","\u003Cp data-speakable=\"summary\">xAI’s Grok Build now has \u002Fgoal, a mode that plans, executes, and verifies coding tasks on the developer’s machine.\u003C\u002Fp>\u003Cp>\u003Ca href=\"https:\u002F\u002Fwww.techtimes.com\u002Farticles\u002F318976\u002F20260624\u002Fgrok-build-ships-autonomous-execution-xai-agent-now-plans-runs-verifies.htm\" target=\"_blank\" rel=\"noopener\">xAI\u003C\u002Fa> shipped \u003Cstrong>\u002Fgoal\u003C\u002Fstrong> in Grok Build on \u003Cstrong>June 22, 2026\u003C\u002Fstrong>, and the pitch is simple: give the \u003Ca href=\"\u002Ftag\u002Fagent\">agent\u003C\u002Fa> one objective, then let it run until it can prove the job is done. That puts Grok Build into a tighter race with \u003Ca href=\"https:\u002F\u002Fwww.anthropic.com\u002Fclaude-code\" target=\"_blank\" rel=\"noopener\">Claude Code\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopenai\u002Fcodex\" target=\"_blank\" rel=\"noopener\">OpenAI Codex CLI\u003C\u002Fa>, but with a more aggressive autonomy story.\u003C\u002Fp>\u003Cp>What makes this launch worth paying attention to is the verification loop. A lot of coding agents can edit files and summarize what they changed. Fewer can keep working until they test the result, inspect the output, and fix their own mistakes before handing control back.\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>Metric\u003C\u002Fth>\u003Cth>Value\u003C\u002Fth>\u003Cth>What it means\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>\u002Fgoal launch date\u003C\u002Ftd>\u003Ctd>June 22, 2026\u003C\u002Ftd>\u003Ctd>Autonomous mode is live now\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>SuperGrok\u003C\u002Ftd>\u003Ctd>$30\u002Fmonth\u003C\u002Ftd>\u003Ctd>Lowest entry tier for Grok Build\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>X Premium Plus\u003C\u002Ftd>\u003Ctd>$40\u002Fmonth\u003C\u002Ftd>\u003Ctd>Another access path to the CLI\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>SuperGrok Heavy\u003C\u002Ftd>\u003Ctd>$300\u002Fmonth\u003C\u002Ftd>\u003Ctd>High-usage tier\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Grok Build 0.1 context window\u003C\u002Ftd>\u003Ctd>256,000 tokens\u003C\u002Ftd>\u003Ctd>Large enough for long sessions\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Earlier SWE-Bench Verified score\u003C\u002Ftd>\u003Ctd>70.8%\u003C\u002Ftd>\u003Ctd>Benchmark baseline for xAI’s coder\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Claude Code Opus 4.7 score\u003C\u002Ftd>\u003Ctd>87.6%\u003C\u002Ftd>\u003Ctd>Competitive benchmark target\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>What \u002Fgoal actually changes\u003C\u002Fh2>\u003Cp>Standard coding agents still work like a chat loop. You ask for a change, the model writes code, you review the result, and the next prompt comes from you. \u002Fgoal changes that rhythm by turning one prompt into a bounded task with its own checklist, progress panel, and completion test.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782374583821-11kl.png\" alt=\"Grok Build Adds \u002Fgoal for Autonomous Coding\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>That matters because developer time is often lost in the handoff between generation and verification. A model can produce code that compiles and still ship a broken feature. xAI is trying to move the verification step inside the agent itself, so the system keeps iterating until it can defend its own output.\u003C\u002Fp>\u003Cp>The feature also keeps the human in the loop in a lighter way. Developers can check live status, pause the run, resume it, cancel it, or add new instructions while the agent works. The difference is that they no longer have to approve every step of the task.\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong>\u002Fgoal status\u003C\u002Fstrong> shows live progress\u003C\u002Fli>\u003Cli>\u003Cstrong>\u002Fgoal pause\u003C\u002Fstrong> stops execution temporarily\u003C\u002Fli>\u003Cli>\u003Cstrong>\u002Fgoal resume\u003C\u002Fstrong> continues the run\u003C\u002Fli>\u003Cli>\u003Cstrong>\u002Fgoal clear\u003C\u002Fstrong> cancels the task\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Verification is the real headline\u003C\u002Fh2>\u003Cp>xAI says \u002Fgoal can verify work in three ways: by reviewing the code it produced, by inspecting web pages to confirm runtime behavior, or by running scripts directly. That is a smarter design than a simple “done” message, because it gives the agent a chance to catch failures before the developer does.\u003C\u002Fp>\u003Cp>That approach tackles a common weakness in first-wave coding agents. They often look productive, but their confidence is higher than their accuracy. The new mode tries to reduce that gap by forcing a proof step before completion.\u003C\u002Fp>\u003Cblockquote>“Coding agents are becoming the procurement front where AI labs compete to own the developer workflow.” — Mitch Ashley, VP and practice lead for software lifecycle engineering at The Futurum Group\u003C\u002Fblockquote>\u003Cp>Ashley’s point is worth sitting with. The competition is no longer just about who can write the best snippet. It is about who can own the way teams plan, test, and ship software every day.\u003C\u002Fp>\u003Cp>There is still an open question here: can a model verify work it was also responsible for generating? If the generator and verifier are too similar, the check can become shallow self-approval. xAI has not published enough detail to prove that \u002Fgoal avoids that trap.\u003C\u002Fp>\u003Ch2>The two-model setup and why it matters\u003C\u002Fh2>\u003Cp>\u002Fgoal uses both \u003Ca href=\"https:\u002F\u002Fx.ai\u002Fgrok\" target=\"_blank\" rel=\"noopener\">Grok Build\u003C\u002Fa> 0.1 and \u003Ca href=\"https:\u002F\u002Fx.ai\" target=\"_blank\" rel=\"noopener\">Composer 2.5\u003C\u002Fa>. xAI says one model handles planning and instruction-following while the other handles code generation and execution. On paper, that split sounds sensible: one model reasons about the task, the other does the mechanical work.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782374589928-boxd.png\" alt=\"Grok Build Adds \u002Fgoal for Autonomous Coding\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>The catch is independence. If both models share similar training signals and failure modes, the verifier may miss the same bugs the generator misses. That is why production testing matters more than the architecture diagram. Developers will find out quickly whether the second pass catches real mistakes or just echoes the first pass with cleaner wording.\u003C\u002Fp>\u003Cp>There is one practical detail that gives Grok Build a real security argument: all code runs on the developer’s local machine. Nothing in the codebase is sent to xAI’s servers during a session. For teams in finance, healthcare, or government, that local-first approach can matter as much as raw model quality.\u003C\u002Fp>\u003Cul>\u003Cli>\u003Ca href=\"https:\u002F\u002Fdocs.x.ai\u002F\" target=\"_blank\" rel=\"noopener\">xAI docs\u003C\u002Fa> describe the CLI workflow and access tiers\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopenai\u002Fcodex\" target=\"_blank\" rel=\"noopener\">Codex CLI\u003C\u002Fa> is one of the main alternatives\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fwww.anthropic.com\u002Fclaude-code\" target=\"_blank\" rel=\"noopener\">Claude Code\u003C\u002Fa> still leads on benchmark reputation\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fai.google.dev\u002Fgemini-api\u002Fdocs\u002Fcode-assist\" target=\"_blank\" rel=\"noopener\">Gemini Code Assist\u003C\u002Fa> is another enterprise option\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>How Grok Build compares with the competition\u003C\u002Fh2>\u003Cp>The \u003Ca href=\"\u002Ftag\u002Fbenchmark\">benchmark\u003C\u002Fa> numbers are the part xAI cannot ignore. The earlier grok-code-fast-1 model scored \u003Cstrong>70.8%\u003C\u002Fstrong> on \u003Ca href=\"\u002Ftag\u002Fswe-bench-verified\">SWE-Bench Verified\u003C\u002Fa>. \u003Ca href=\"\u002Ftag\u002Fclaude-code\">Claude Code\u003C\u002Fa> on \u003Ca href=\"\u002Ftag\u002Fopus-47\">Opus 4.7\u003C\u002Fa> scores \u003Cstrong>87.6%\u003C\u002Fstrong> on the same benchmark. xAI has not published a fresh score for the current production \u003Cstrong>grok-build-0.1\u003C\u002Fstrong> model.\u003C\u002Fp>\u003Cp>That gap is large enough that no one should pretend Grok Build has already caught up on raw coding skill. What xAI is arguing instead is that long-running autonomous execution changes the question. If the agent keeps testing and fixing until the output works, the final result may matter more than a single benchmark pass.\u003C\u002Fp>\u003Cp>That is a fair argument, but it needs proof in real projects. Benchmarks still matter because they predict how often an agent gets stuck, hallucinates a fix, or misses edge cases. A stronger workflow can hide some weakness, but it cannot erase it.\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong>Grok Build\u003C\u002Fstrong>: 70.8% SWE-Bench Verified on the earlier coder model\u003C\u002Fli>\u003Cli>\u003Cstrong>Claude Code\u003C\u002Fstrong>: 87.6% on SWE-Bench Verified\u003C\u002Fli>\u003Cli>\u003Cstrong>Context window\u003C\u002Fstrong>: 256,000 tokens for Grok Build 0.1\u003C\u002Fli>\u003Cli>\u003Cstrong>Access cost\u003C\u002Fstrong>: $30, $40, or $300 per month depending on tier\u003C\u002Fli>\u003C\u002Ful>\u003Cp>One more competitive wrinkle: xAI is also planning \u003Cstrong>Arena Mode\u003C\u002Fstrong>, which would run multiple agents in parallel and pick the best output. If that ships, Grok Build could compensate for weaker single-run performance by choosing among several attempts instead of trusting one answer.\u003C\u002Fp>\u003Ch2>What developers should watch next\u003C\u002Fh2>\u003Cp>The most interesting test is not whether \u002Fgoal sounds impressive in a demo. It is whether it reduces the number of times a developer has to reopen a task because the agent said “done” too early. That is the kind of failure mode teams feel immediately.\u003C\u002Fp>\u003Cp>Grok Build is now moving fast enough that the product story is changing every few weeks: beta launch, Composer 2.5, plugin marketplace, and now autonomous execution. The pace suggests xAI wants Grok Build to become a daily coding tool, not just a novelty CLI.\u003C\u002Fp>\u003Cp>My read is simple: \u002Fgoal is a meaningful product step, but it is not a verdict on xAI’s coding quality. The next few weeks of real-world use will tell us whether the verification loop is genuine or just a nicer wrapper around the same old agent mistakes. If you are evaluating coding agents for a team, the right question is whether the tool can finish a task without making you become the test runner.\u003C\u002Fp>\u003Cp>For OraCore readers, this is the feature to watch: if \u002Fgoal can reliably close the loop on local code changes, xAI may have found a workflow advantage even before it closes the benchmark gap.\u003C\u002Fp>","xAI’s Grok Build now has \u002Fgoal, a mode that plans, executes, and verifies coding tasks on the developer’s machine.","www.techtimes.com","https:\u002F\u002Fwww.techtimes.com\u002Farticles\u002F318976\u002F20260624\u002Fgrok-build-ships-autonomous-execution-xai-agent-now-plans-runs-verifies.htm",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782374583821-11kl.png","ai-agent","en","08c3c919-2446-4dda-85fb-c18b6ffc3b8d",[17,18,19,20,21],"xAI","Grok Build","autonomous coding","\u002Fgoal","Claude Code",[23,24,25],"\u002Fgoal lets Grok Build plan, execute, and verify a task with less human back-and-forth.","xAI says the mode uses Composer 2.5 and Grok Build 0.1 in a two-model pipeline.","The big question is whether its self-verification catches real bugs or just confirms its own output.",0,"2026-06-25T08:02:38.973865+00:00","2026-06-25T08:02:38.956+00:00","a9bee732-b07c-4e5b-a0e6-3048577e32a7",{"tags":31,"relatedLang":36,"relatedPosts":40},[32,34],{"name":17,"slug":33},"xai",{"name":21,"slug":35},"claude-code",{"id":15,"slug":37,"title":38,"language":39},"grok-build-goal-autonomous-coding-zh","Grok Build 加上 \u002Fgoal，自動寫碼更像樣了","zh",[41,47,53,59,65,71],{"id":42,"slug":43,"title":44,"cover_image":45,"image_url":45,"created_at":46,"category":13},"07fb3bcc-9f38-4153-a9c8-5d67ba7f5018","codex-third-party-model-integration-guide-en","Codex 接入第三方模型完整指南","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782396176284-xvyq.png","2026-06-25T14:02:29.820439+00:00",{"id":48,"slug":49,"title":50,"cover_image":51,"image_url":51,"created_at":52,"category":13},"c41b19d2-48c8-4d88-92f9-d92d73cf9e90","set-up-ai-agent-workflows-5-practical-steps-en","Set Up AI Agent Workflows in 5 Practical Steps","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782314280123-7fv1.png","2026-06-24T15:17:28.642801+00:00",{"id":54,"slug":55,"title":56,"cover_image":57,"image_url":57,"created_at":58,"category":13},"61c1e05c-ea78-4f0a-b389-3f09eeabf7e3","anthropic-claude-tag-research-slack-search-en","Anthropic’s Claude Tag Research turns Slack into search","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782285508413-kwit.png","2026-06-24T07:18:03.3764+00:00",{"id":60,"slug":61,"title":62,"cover_image":63,"image_url":63,"created_at":64,"category":13},"8dbcd7ac-bae7-46c1-ba11-bdca1fd774e8","benchmark-harness-quality-beats-model-hype-coding-en","This benchmark proves harness quality beats model hype in coding","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782253062750-lxuw.png","2026-06-23T22:17:21.750686+00:00",{"id":66,"slug":67,"title":68,"cover_image":69,"image_url":69,"created_at":70,"category":13},"020c80c8-c92f-4f7b-a175-0cb29bd1b8c7","glm-5-kill-vibe-coding-agent-engineering-en","GLM-5 Is Right to Kill Vibe Coding and Push Agent Engineering","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782223372408-840t.png","2026-06-23T14:02:24.351865+00:00",{"id":72,"slug":73,"title":74,"cover_image":75,"image_url":75,"created_at":76,"category":13},"0e0ff333-9be7-4ea6-9a5f-3cf28ded9856","loop-engineering-claude-code-workflow-en","Loop Engineering: Claude Code背后的新工作法","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782205390205-vbwl.png","2026-06-23T09:02:37.953008+00:00",[78,83,88,93,98,103,108,113,118,123],{"id":79,"slug":80,"title":81,"created_at":82},"03db8de8-8dc2-4ac1-9cf7-898782efbb1f","anthropic-claude-ai-agent-task-automation-en","Anthropic's Claude AI Agent: A New Era of Task Automation","2026-03-25T16:25:06.513026+00:00",{"id":84,"slug":85,"title":86,"created_at":87},"045d1abc-190d-4594-8c95-91e2a26f0c5a","googles-2026-ai-agent-report-decoded-en","Google’s 2026 AI Agent Report, Decoded","2026-03-26T11:15:23.046616+00:00",{"id":89,"slug":90,"title":91,"created_at":92},"e64aba21-254b-4f93-aa21-837484bb52ec","kimi-k25-review-stronger-still-not-legend-en","Kimi K2.5 review: stronger, still not a legend","2026-03-27T07:15:55.385951+00:00",{"id":94,"slug":95,"title":96,"created_at":97},"30dfb781-a1b2-4add-aebe-b3df40247c37","claude-code-controls-mac-desktop-en","Claude Code now controls your Mac desktop","2026-03-28T03:01:59.384091+00:00",{"id":99,"slug":100,"title":101,"created_at":102},"254405b6-7833-4800-8e13-f5196deefbe6","cloudflare-100x-faster-ai-agent-sandbox-en","Cloudflare’s 100x Faster AI Agent Sandbox","2026-03-28T03:09:44.356437+00:00",{"id":104,"slug":105,"title":106,"created_at":107},"04f29b7f-9b91-4306-89a7-97d725e6e1ba","openai-backs-isara-agent-swarm-bet-en","OpenAI backs Isara’s agent-swarm bet","2026-03-28T03:15:27.849766+00:00",{"id":109,"slug":110,"title":111,"created_at":112},"3b0bf479-e4ae-4703-9666-721a7e0cdb91","openai-plan-automated-ai-researcher-en","OpenAI’s plan for an automated AI researcher","2026-03-28T03:17:42.312819+00:00",{"id":114,"slug":115,"title":116,"created_at":117},"fe91bce0-b85d-4efa-a207-24ae9939c29f","harness-engineering-ai-agent-reliability-2026","Harness Engineering: From Bridle to Operating System, The Missing Link in AI Agent Reliability","2026-03-31T06:36:55.648751+00:00",{"id":119,"slug":120,"title":121,"created_at":122},"7a09007d-820f-43b3-8607-8ad1bfcb94c8","mcp-explained-from-prompts-to-production-en","MCP Explained: From Prompts to Production","2026-04-01T09:24:40.089177+00:00",{"id":124,"slug":125,"title":126,"created_at":127},"116d5ee9-a4f1-4b5a-aac5-5d035dd22bbe","amazon-bedrock-agents-multi-agent-workflows-en","Amazon Bedrock Agents Gets Multi-Agent Workflows","2026-04-01T09:30:30.197685+00:00"]