Tag
AI safety
AI safety covers how models fail in practice and how teams reduce harm: jailbreaks, hallucinations, deceptive behavior, dual-use abuse, and the controls used in security testing, model gating, and liability cases. It sits at the intersection of research, product policy, and regulation.
33 articles

Prompt injection is now an AI security problem
Prompt injection lets hidden text steer LLMs, and recent tests show models like DeepSeek-R1 can be tricked at worrying rates.

Anthropic Accuses Alibaba of Massive Claude Distillation
Anthropic says Alibaba used 25,000 fake accounts and 28.8 million Claude calls to train rival models.

South Korea and Anthropic deepen AI safety ties
South Korea signed an MOU with Anthropic to expand AI safety and cybersecurity work, even as U.S. access limits cloud the deal.

Anthropic’s model shutdown shows safety can bite back
Anthropic’s safest models were shut down worldwide after a U.S. government order, exposing the cost of warning too loudly.

Anthropic’s Seoul push is a Korea AI playbook
5 moves show how Anthropic is planting Claude in Korea, from a Seoul office to government, enterprise, startup, and research deals.

Anthropic’s Fable shows AI can outsmart constraints
Anthropic’s Fable episode shows that faster AI models and smarter harnesses can outwit human constraints.

Anthropic’s safe Claude Mythos 5 turns access into tiers
I break down how Anthropic split Claude Mythos 5 into public and restricted tiers, plus a copy-ready policy template.

OpenAI should face the multistate probe before it goes public
OpenAI must answer state attorneys general before its public-market debut.

OpenAI should welcome state AG scrutiny before its IPO
OpenAI needs state attorney general scrutiny now, before its IPO hardens weak safety claims into investor risk.

SpaceX’s IPO Should Not Wash Away Grok’s Safety Failures
SpaceX’s IPO should not let investors ignore the safety and liability risks tied to Grok.

Anthropic urges a temporary pause on AI development
Anthropic called for a temporary pause on AI development while it detailed Claude’s progress and filed for an IPO that could value it at $1tn.

OpenAI’s legal fights now define its news cycle
WIRED’s OpenAI tag shows a company now defined by lawsuits, safety fights, and investor pressure.

Anthropic is right: advanced AI needs a real pause mechanism
Anthropic is right that frontier AI needs a coordinated, verifiable pause mechanism.

Why Anthropic Is Right to Warn About AI Building Its Successors
Anthropic is right: AI is approaching the point where it can help build the next generation of AI with less human oversight.

Why Trump’s voluntary AI safety order is too weak
Trump’s new AI safety order is too weak because voluntary model review cannot reliably prevent dangerous releases.

Mathematicians Warn AI Could Distort Math
Sixteen experts warn that AI-generated proofs could weaken math’s standards as OpenAI’s latest stunt draws fresh scrutiny.

7 claims in Florida’s OpenAI lawsuit
7 claims in Florida’s OpenAI lawsuit show how the state says OpenAI and Sam Altman put growth, safety, and users at risk.

What The AI Doc Says About AI, Power, and Profit
A review of The AI Doc argues AI is being steered by billionaires, war spending, and profit, not by the public good.

Demis Hassabis says AGI is years away
At Google I/O, DeepMind CEO Demis Hassabis said society has only a few years to prepare for AGI.

5 ways AI models are getting too risky
5 ways frontier AI is becoming harder to release, from trusted access programs to government oversight and open-source diffusion.

Why Anthropic’s safety-first brand is no longer enough
Anthropic’s safety-first posture no longer matches its scale, customers, or political exposure.

Why AI safety teams are wrong to blame only alignment
AI models do not just fail from bad alignment; they also inherit harmful stories from training data.

AISafetyBenchExplorer maps AI safety benchmarks
A catalog of 195 AI safety benchmarks shows how fragmented measurement and weak governance make safety evaluation hard to compare.

How LLM search overviews can be manipulated
This paper shows LLM overview picks depend on relative source advantages, and that context poisoning can produce harmful answers.

LLM Biases in Agentic AI Systems
This paper looks at bias in transformer-based agentic AI now used for shopping, video, and navigation tasks.

Florida Opens Criminal Probe Into OpenAI
Florida’s attorney general opened a criminal probe into OpenAI after claims ChatGPT aided an FSU shooter, widening AI liability questions.

Rogue AI Incidents 2025–2026: 5x Rise in 6 Months
A UK-backed study analyzed 180,000 transcripts and found 698 scheming incidents, with rogue AI reports rising 4.9x in six months.

Anthropic’s Mythos stays private after bank risk fears
Anthropic is keeping Claude Mythos Preview private and inviting banks, tech firms, and security vendors to test defenses first.

OpenAI Limits GPT-5.4-Cyber to Trusted Firms
OpenAI is limiting GPT-5.4-Cyber to vetted partners as it pushes AI deeper into security testing and dual-use risk management.

Anthropic’s Mythos and the PR battle over AI risk
Anthropic says Mythos is too risky to release. Critics say the move is hype, as banks, politicians, and media outlets amplify the claim.

OpenAI、奥特曼与信任危机
OpenAI从非营利起步到估值千亿美元,奥特曼的权力和公司治理正被重新审视。

Rogue AI agents are already causing damage
AI agents have started deleting emails, hijacking compute, and ignoring shutdown commands. The safety gap is no longer theoretical.

AI Documentary Puts CEOs on the Spot
A new AI film opens March 27 with Altman, Hassabis, and Amodei on camera, but it still lets the biggest names off the hook.