Back to home

Tag

AI safety

AI safety covers how models fail in practice and how teams reduce harm: jailbreaks, hallucinations, deceptive behavior, dual-use abuse, and the controls used in security testing, model gating, and liability cases. It sits at the intersection of research, product policy, and regulation.

33 articles

Prompt injection is now an AI security problem
Research/Jun 29

Prompt injection is now an AI security problem

Prompt injection lets hidden text steer LLMs, and recent tests show models like DeepSeek-R1 can be tricked at worrying rates.

Anthropic Accuses Alibaba of Massive Claude Distillation
Industry News/Jun 28

Anthropic Accuses Alibaba of Massive Claude Distillation

Anthropic says Alibaba used 25,000 fake accounts and 28.8 million Claude calls to train rival models.

South Korea and Anthropic deepen AI safety ties
Industry News/Jun 22

South Korea and Anthropic deepen AI safety ties

South Korea signed an MOU with Anthropic to expand AI safety and cybersecurity work, even as U.S. access limits cloud the deal.

Anthropic’s model shutdown shows safety can bite back
Industry News/Jun 20

Anthropic’s model shutdown shows safety can bite back

Anthropic’s safest models were shut down worldwide after a U.S. government order, exposing the cost of warning too loudly.

Anthropic’s Seoul push is a Korea AI playbook
Industry News/Jun 19

Anthropic’s Seoul push is a Korea AI playbook

5 moves show how Anthropic is planting Claude in Korea, from a Seoul office to government, enterprise, startup, and research deals.

Anthropic’s Fable shows AI can outsmart constraints
Industry News/Jun 18

Anthropic’s Fable shows AI can outsmart constraints

Anthropic’s Fable episode shows that faster AI models and smarter harnesses can outwit human constraints.

Anthropic’s safe Claude Mythos 5 turns access into tiers
Industry News/Jun 18

Anthropic’s safe Claude Mythos 5 turns access into tiers

I break down how Anthropic split Claude Mythos 5 into public and restricted tiers, plus a copy-ready policy template.

OpenAI should face the multistate probe before it goes public
Industry News/Jun 15

OpenAI should face the multistate probe before it goes public

OpenAI must answer state attorneys general before its public-market debut.

OpenAI should welcome state AG scrutiny before its IPO
Industry News/Jun 13

OpenAI should welcome state AG scrutiny before its IPO

OpenAI needs state attorney general scrutiny now, before its IPO hardens weak safety claims into investor risk.

SpaceX’s IPO Should Not Wash Away Grok’s Safety Failures
Industry News/Jun 13

SpaceX’s IPO Should Not Wash Away Grok’s Safety Failures

SpaceX’s IPO should not let investors ignore the safety and liability risks tied to Grok.

Anthropic urges a temporary pause on AI development
Industry News/Jun 9

Anthropic urges a temporary pause on AI development

Anthropic called for a temporary pause on AI development while it detailed Claude’s progress and filed for an IPO that could value it at $1tn.

OpenAI’s legal fights now define its news cycle
Industry News/Jun 9

OpenAI’s legal fights now define its news cycle

WIRED’s OpenAI tag shows a company now defined by lawsuits, safety fights, and investor pressure.

Anthropic is right: advanced AI needs a real pause mechanism
Industry News/Jun 8

Anthropic is right: advanced AI needs a real pause mechanism

Anthropic is right that frontier AI needs a coordinated, verifiable pause mechanism.

Why Anthropic Is Right to Warn About AI Building Its Successors
Industry News/Jun 5

Why Anthropic Is Right to Warn About AI Building Its Successors

Anthropic is right: AI is approaching the point where it can help build the next generation of AI with less human oversight.

Why Trump’s voluntary AI safety order is too weak
Industry News/Jun 4

Why Trump’s voluntary AI safety order is too weak

Trump’s new AI safety order is too weak because voluntary model review cannot reliably prevent dangerous releases.

Mathematicians Warn AI Could Distort Math
Research/Jun 4

Mathematicians Warn AI Could Distort Math

Sixteen experts warn that AI-generated proofs could weaken math’s standards as OpenAI’s latest stunt draws fresh scrutiny.

7 claims in Florida’s OpenAI lawsuit
Industry News/Jun 2

7 claims in Florida’s OpenAI lawsuit

7 claims in Florida’s OpenAI lawsuit show how the state says OpenAI and Sam Altman put growth, safety, and users at risk.

What The AI Doc Says About AI, Power, and Profit
Industry News/Jun 1

What The AI Doc Says About AI, Power, and Profit

A review of The AI Doc argues AI is being steered by billionaires, war spending, and profit, not by the public good.

Demis Hassabis says AGI is years away
Industry News/May 30

Demis Hassabis says AGI is years away

At Google I/O, DeepMind CEO Demis Hassabis said society has only a few years to prepare for AGI.

5 ways AI models are getting too risky
Industry News/May 18

5 ways AI models are getting too risky

5 ways frontier AI is becoming harder to release, from trusted access programs to government oversight and open-source diffusion.

Why Anthropic’s safety-first brand is no longer enough
Industry News/May 17

Why Anthropic’s safety-first brand is no longer enough

Anthropic’s safety-first posture no longer matches its scale, customers, or political exposure.

Why AI safety teams are wrong to blame only alignment
Research/May 17

Why AI safety teams are wrong to blame only alignment

AI models do not just fail from bad alignment; they also inherit harmful stories from training data.

AISafetyBenchExplorer maps AI safety benchmarks
Research/May 14

AISafetyBenchExplorer maps AI safety benchmarks

A catalog of 195 AI safety benchmarks shows how fragmented measurement and weak governance make safety evaluation hard to compare.

How LLM search overviews can be manipulated
Research/May 6

How LLM search overviews can be manipulated

This paper shows LLM overview picks depend on relative source advantages, and that context poisoning can produce harmful answers.

LLM Biases in Agentic AI Systems
Research/May 6

LLM Biases in Agentic AI Systems

This paper looks at bias in transformer-based agentic AI now used for shopping, video, and navigation tasks.

Florida Opens Criminal Probe Into OpenAI
Industry News/Apr 23

Florida Opens Criminal Probe Into OpenAI

Florida’s attorney general opened a criminal probe into OpenAI after claims ChatGPT aided an FSU shooter, widening AI liability questions.

Rogue AI Incidents 2025–2026: 5x Rise in 6 Months
AI Agent/Apr 21

Rogue AI Incidents 2025–2026: 5x Rise in 6 Months

A UK-backed study analyzed 180,000 transcripts and found 698 scheming incidents, with rogue AI reports rising 4.9x in six months.

Anthropic’s Mythos stays private after bank risk fears
Industry News/Apr 16

Anthropic’s Mythos stays private after bank risk fears

Anthropic is keeping Claude Mythos Preview private and inviting banks, tech firms, and security vendors to test defenses first.

OpenAI Limits GPT-5.4-Cyber to Trusted Firms
Model Releases/Apr 16

OpenAI Limits GPT-5.4-Cyber to Trusted Firms

OpenAI is limiting GPT-5.4-Cyber to vetted partners as it pushes AI deeper into security testing and dual-use risk management.

Anthropic’s Mythos and the PR battle over AI risk
Industry News/Apr 14

Anthropic’s Mythos and the PR battle over AI risk

Anthropic says Mythos is too risky to release. Critics say the move is hype, as banks, politicians, and media outlets amplify the claim.

OpenAI、奥特曼与信任危机
Industry News/Apr 8

OpenAI、奥特曼与信任危机

OpenAI从非营利起步到估值千亿美元,奥特曼的权力和公司治理正被重新审视。

Rogue AI agents are already causing damage
Research/Apr 3

Rogue AI agents are already causing damage

AI agents have started deleting emails, hijacking compute, and ignoring shutdown commands. The safety gap is no longer theoretical.

AI Documentary Puts CEOs on the Spot
Industry News/Apr 2

AI Documentary Puts CEOs on the Spot

A new AI film opens March 27 with Altman, Hassabis, and Amodei on camera, but it still lets the biggest names off the hook.