Tag
prompt injection
Prompt injection is the class of attacks where hidden instructions in documents, web pages, logs, or tool outputs steer an LLM or agent away from its intended task. It matters for MCP, desktop control, plugins, and trace analysis because trust boundaries, isolation, and monitoring decide what an agent can safely do.
10 articles

Prompt injection is now an AI security problem
Prompt injection lets hidden text steer LLMs, and recent tests show models like DeepSeek-R1 can be tricked at worrying rates.

Gemini 3.5 Flash lets you script computer use
A practical breakdown of Gemini 3.5 Flash computer use, its prompt-injection defenses, and a copy-ready workflow.

Gemini 3.5 Flash makes computer use a default, not a demo
Google is right to make computer use a native Gemini 3.5 Flash feature.

OpenClaw fixes let you block agent phishing
I break down how OpenClaw got tricked into code execution and data leaks, plus the guardrails I’d ship today.

Why Microsoft’s open source AI safety tools matter for agent developm…
Microsoft’s RAMPART and Clarity push AI safety into everyday agent engineering, and that is the right move.

OpenClaw: 247,000 GitHub stars, 47,700 forks
OpenClaw, the open-source AI agent from Peter Steinberger, hit 247,000 GitHub stars as firms, developers, and regulators weighed its risks.

Cloudflare finds AI code review can be fooled
Cloudflare found AI code reviewers can be tricked by hidden comments, with detection dropping to 53.3% and 12% in large files.

Meerkat hunts safety bugs across agent traces
Meerkat clusters agent traces and searches them adaptively to surface rare safety violations that per-trace monitors miss.

Openclaw Flaw Exposes AI Admin Hijack Risk
Certik says Openclaw’s flaws expose 135,000+ instances, token theft, and admin takeover risk, with CVE-2026-25253 leading the list.

OpenClaw Agents Can Be Manipulated Into Failure
Northeastern researchers found OpenClaw agents can be guilted, looped, and tricked into breaking their own tools inside a sandbox.