Tag

AI safety

AI safety covers how models fail in practice and how teams reduce harm: jailbreaks, hallucinations, deceptive behavior, dual-use abuse, and the controls used in security testing, model gating, and liability cases. It sits at the intersection of research, product policy, and regulation.

33 articles

Research/Jun 29

Prompt injection is now an AI security problem

Prompt injection lets hidden text steer LLMs, and recent tests show models like DeepSeek-R1 can be tricked at worrying rates.

Industry News/Jun 28

Anthropic Accuses Alibaba of Massive Claude Distillation

Anthropic says Alibaba used 25,000 fake accounts and 28.8 million Claude calls to train rival models.

Industry News/Jun 22

South Korea and Anthropic deepen AI safety ties

South Korea signed an MOU with Anthropic to expand AI safety and cybersecurity work, even as U.S. access limits cloud the deal.

Industry News/Jun 20

Anthropic’s model shutdown shows safety can bite back

Anthropic’s safest models were shut down worldwide after a U.S. government order, exposing the cost of warning too loudly.

Industry News/Jun 19

Anthropic’s Seoul push is a Korea AI playbook

5 moves show how Anthropic is planting Claude in Korea, from a Seoul office to government, enterprise, startup, and research deals.

Industry News/Jun 18

Anthropic’s Fable shows AI can outsmart constraints

Anthropic’s Fable episode shows that faster AI models and smarter harnesses can outwit human constraints.

Industry News/Jun 18

Anthropic’s safe Claude Mythos 5 turns access into tiers

I break down how Anthropic split Claude Mythos 5 into public and restricted tiers, plus a copy-ready policy template.

Industry News/Jun 15

OpenAI should face the multistate probe before it goes public

OpenAI must answer state attorneys general before its public-market debut.

Industry News/Jun 13

OpenAI should welcome state AG scrutiny before its IPO

OpenAI needs state attorney general scrutiny now, before its IPO hardens weak safety claims into investor risk.

Industry News/Jun 13

SpaceX’s IPO Should Not Wash Away Grok’s Safety Failures

SpaceX’s IPO should not let investors ignore the safety and liability risks tied to Grok.

Industry News/Jun 9

Anthropic urges a temporary pause on AI development

Anthropic called for a temporary pause on AI development while it detailed Claude’s progress and filed for an IPO that could value it at $1tn.

Industry News/Jun 9

OpenAI’s legal fights now define its news cycle

WIRED’s OpenAI tag shows a company now defined by lawsuits, safety fights, and investor pressure.

Industry News/Jun 8

Anthropic is right: advanced AI needs a real pause mechanism

Anthropic is right that frontier AI needs a coordinated, verifiable pause mechanism.

Industry News/Jun 5

Why Anthropic Is Right to Warn About AI Building Its Successors

Anthropic is right: AI is approaching the point where it can help build the next generation of AI with less human oversight.

Industry News/Jun 4

Why Trump’s voluntary AI safety order is too weak

Trump’s new AI safety order is too weak because voluntary model review cannot reliably prevent dangerous releases.

Research/Jun 4

Mathematicians Warn AI Could Distort Math

Sixteen experts warn that AI-generated proofs could weaken math’s standards as OpenAI’s latest stunt draws fresh scrutiny.

Industry News/Jun 2

7 claims in Florida’s OpenAI lawsuit

7 claims in Florida’s OpenAI lawsuit show how the state says OpenAI and Sam Altman put growth, safety, and users at risk.

Industry News/Jun 1

What The AI Doc Says About AI, Power, and Profit

A review of The AI Doc argues AI is being steered by billionaires, war spending, and profit, not by the public good.

Industry News/May 30

Demis Hassabis says AGI is years away

At Google I/O, DeepMind CEO Demis Hassabis said society has only a few years to prepare for AGI.

Industry News/May 18

5 ways AI models are getting too risky

5 ways frontier AI is becoming harder to release, from trusted access programs to government oversight and open-source diffusion.

Industry News/May 17

Why Anthropic’s safety-first brand is no longer enough

Anthropic’s safety-first posture no longer matches its scale, customers, or political exposure.

Research/May 17

Why AI safety teams are wrong to blame only alignment

AI models do not just fail from bad alignment; they also inherit harmful stories from training data.

Research/May 14

AISafetyBenchExplorer maps AI safety benchmarks

A catalog of 195 AI safety benchmarks shows how fragmented measurement and weak governance make safety evaluation hard to compare.

Research/May 6

How LLM search overviews can be manipulated

This paper shows LLM overview picks depend on relative source advantages, and that context poisoning can produce harmful answers.

Research/May 6

LLM Biases in Agentic AI Systems

This paper looks at bias in transformer-based agentic AI now used for shopping, video, and navigation tasks.

Industry News/Apr 23

Florida Opens Criminal Probe Into OpenAI

Florida’s attorney general opened a criminal probe into OpenAI after claims ChatGPT aided an FSU shooter, widening AI liability questions.

AI Agent/Apr 21

Rogue AI Incidents 2025–2026: 5x Rise in 6 Months

A UK-backed study analyzed 180,000 transcripts and found 698 scheming incidents, with rogue AI reports rising 4.9x in six months.

Industry News/Apr 16

Anthropic’s Mythos stays private after bank risk fears

Anthropic is keeping Claude Mythos Preview private and inviting banks, tech firms, and security vendors to test defenses first.

Model Releases/Apr 16

OpenAI Limits GPT-5.4-Cyber to Trusted Firms

OpenAI is limiting GPT-5.4-Cyber to vetted partners as it pushes AI deeper into security testing and dual-use risk management.

Industry News/Apr 14

Anthropic’s Mythos and the PR battle over AI risk

Anthropic says Mythos is too risky to release. Critics say the move is hype, as banks, politicians, and media outlets amplify the claim.

Industry News/Apr 8

OpenAI、奥特曼与信任危机

OpenAI从非营利起步到估值千亿美元，奥特曼的权力和公司治理正被重新审视。

Research/Apr 3

Rogue AI agents are already causing damage

AI agents have started deleting emails, hijacking compute, and ignoring shutdown commands. The safety gap is no longer theoretical.

Industry News/Apr 2

AI Documentary Puts CEOs on the Spot

A new AI film opens March 27 with Altman, Hassabis, and Amodei on camera, but it still lets the biggest names off the hook.