Back to home

Tag

prompt injection

Prompt injection is the class of attacks where hidden instructions in documents, web pages, logs, or tool outputs steer an LLM or agent away from its intended task. It matters for MCP, desktop control, plugins, and trace analysis because trust boundaries, isolation, and monitoring decide what an agent can safely do.

10 articles

Prompt injection is now an AI security problem
Research/Jun 29

Prompt injection is now an AI security problem

Prompt injection lets hidden text steer LLMs, and recent tests show models like DeepSeek-R1 can be tricked at worrying rates.

Gemini 3.5 Flash lets you script computer use
AI Agent/Jun 29

Gemini 3.5 Flash lets you script computer use

A practical breakdown of Gemini 3.5 Flash computer use, its prompt-injection defenses, and a copy-ready workflow.

Gemini 3.5 Flash makes computer use a default, not a demo
Model Releases/Jun 26

Gemini 3.5 Flash makes computer use a default, not a demo

Google is right to make computer use a native Gemini 3.5 Flash feature.

OpenClaw fixes let you block agent phishing
AI Agent/Jun 20

OpenClaw fixes let you block agent phishing

I break down how OpenClaw got tricked into code execution and data leaks, plus the guardrails I’d ship today.

Why Microsoft’s open source AI safety tools matter for agent developm…
Tools & Apps/May 24

Why Microsoft’s open source AI safety tools matter for agent developm…

Microsoft’s RAMPART and Clarity push AI safety into everyday agent engineering, and that is the right move.

OpenClaw: 247,000 GitHub stars, 47,700 forks
AI Agent/May 22

OpenClaw: 247,000 GitHub stars, 47,700 forks

OpenClaw, the open-source AI agent from Peter Steinberger, hit 247,000 GitHub stars as firms, developers, and regulators weighed its risks.

Cloudflare finds AI code review can be fooled
Research/May 4

Cloudflare finds AI code review can be fooled

Cloudflare found AI code reviewers can be tricked by hidden comments, with detection dropping to 53.3% and 12% in large files.

Meerkat hunts safety bugs across agent traces
Research/Apr 14

Meerkat hunts safety bugs across agent traces

Meerkat clusters agent traces and searches them adaptively to surface rare safety violations that per-trace monitors miss.

Openclaw Flaw Exposes AI Admin Hijack Risk
Blockchain & Web3/Apr 1

Openclaw Flaw Exposes AI Admin Hijack Risk

Certik says Openclaw’s flaws expose 135,000+ instances, token theft, and admin takeover risk, with CVE-2026-25253 leading the list.

OpenClaw Agents Can Be Manipulated Into Failure
Research/Mar 28

OpenClaw Agents Can Be Manipulated Into Failure

Northeastern researchers found OpenClaw agents can be guilted, looped, and tricked into breaking their own tools inside a sandbox.