Anthropic Accuses Alibaba of Massive Claude Distillation
Anthropic says Alibaba used 25,000 fake accounts and 28.8 million Claude calls to train rival models.

Anthropic says Alibaba used fake accounts and millions of Claude calls to train rival models.
Anthropic has accused Alibaba of running one of the largest model distillation operations it has seen, with the claim centering on 25,000 fake accounts and 28.8 million Claude interactions. The allegation matters because it turns a familiar AI safety issue into a scale problem: if the numbers are accurate, this was not casual abuse, but industrialized data extraction.
The dispute also lands in a tense moment for the AI industry, where model providers are tightening usage rules while rivals keep finding ways to probe, imitate, and repurpose frontier systems. If Anthropic’s numbers hold up, the story is less about one company’s policy violation and more about how hard it is to keep a closed model from becoming training fuel for everyone else.
| Claim | Number | Why it matters |
|---|---|---|
| Fake accounts | 25,000 | Suggests organized abuse rather than a few rogue users |
| Claude interactions | 28.8 million | Points to data collection at very large scale |
| Targeted model | Claude | One of the most closely watched frontier AI systems |
What Anthropic says happened
Get the latest AI news in your inbox
Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.
No spam. Unsubscribe at any time.
Anthropic says the operation used large numbers of fake accounts to query Claude and gather outputs that could be used to train competing systems. In plain English, the accusation is that someone treated Claude like a giant answer machine, then fed the responses into model training pipelines.

That is a serious claim because model distillation is a normal technique in AI development, but the way it is done changes everything. If a company collects outputs from another provider at scale without permission, it moves from standard engineering into policy abuse, and possibly contract violations too.
Anthropic has spent the last year positioning itself as a lab that takes misuse prevention seriously, which makes this accusation part technical dispute and part trust test. The company’s newsroom and safety messaging have consistently emphasized controlled access, monitoring, and usage limits.
“We are seeing a rising tide of AI misuse.” — Dario Amodei, Anthropic CEO, in a 2024 interview about AI safety and model abuse
Why the numbers matter
The scale in this case is what makes it hard to dismiss. Twenty-five thousand accounts is enough to look like an operation with tooling, automation, and account rotation. Twenty-eight point eight million interactions suggests a long-running extraction effort, not a one-off test.
Those figures also help explain why this accusation got attention outside Anthropic. AI firms have warned for years that model outputs can be copied, compressed, and reused, but most public conversations stayed abstract. Here, the alleged volume is concrete enough to make the risk feel operational rather than theoretical.
- 25,000 fake accounts can support distributed querying and rate-limit evasion.
- 28.8 million interactions can generate a large training corpus fast.
- Claude is a premium target because its outputs are useful for instruction tuning.
- Large-scale querying can blur the line between product use and data harvesting.
There is also a strategic angle. If a rival can cheaply collect enough high-quality responses from a top-tier model, it can compress development time on its own systems. That is why frontier labs care so much about access controls, logging, and anomaly detection.
Alibaba’s role and the open questions
The accusation names Alibaba, a company with deep AI ambitions and a broad cloud business. But the important question is not only whether Alibaba was involved directly. It is also whether the activity came from an internal team, a contractor, or some other affiliated group.

That distinction matters because large organizations often have multiple product teams, research groups, and external partners touching the same infrastructure. If the allegation is accurate, investigators will want to know who created the fake accounts, who paid for the queries, and where the resulting data ended up.
Alibaba’s AI work includes the Qwen model family and a broader cloud AI stack. Any suggestion that these systems benefited from unauthorized Claude output would raise obvious questions about competition, compliance, and internal governance.
- Who controlled the 25,000 accounts?
- Which teams or vendors generated the 28.8 million calls?
- Was the data used for training, evaluation, or both?
- Did the activity violate service terms or security controls?
Those questions will matter more than the social media heat around the accusation. In AI disputes, the details decide whether a story becomes a legal case, a policy update, or just another round of public posturing.
How this compares with other AI abuse cases
This kind of allegation is part of a wider pattern in AI. Labs have already seen prompt scraping, automated account creation, API abuse, and attempts to reconstruct model behavior from outputs. What makes this case unusually loud is the reported scale and the fact that it touches a high-profile model provider.
OpenAI, OpenAI, Google, and Anthropic all face the same basic problem: useful model outputs are also valuable training data. Once an API can answer millions of questions, someone will try to turn those answers into a cheaper competitor.
- API abuse is common across major model vendors.
- Model distillation is a standard technique when used with permission.
- Unauthorized extraction becomes a security and legal issue.
- Scale is what turns a nuisance into a strategic threat.
For developers, the lesson is practical. API providers need better anomaly detection, stricter identity checks, and stronger usage auditing. Companies building on top of frontier models need to assume that output monitoring is now part of normal AI operations, not an edge case.
For users, this story is a reminder that the AI race is no longer only about benchmark scores and product launches. It is also about who can protect model access, prove misuse, and keep their training data clean enough to defend in public.
What happens next
If Anthropic publishes more evidence, this could become a reference case for how large-scale model extraction is investigated. If the claim weakens, it will still push AI companies to tighten account controls and logging. Either way, the next round of frontier model competition will likely include more monitoring, more friction, and more scrutiny over where training data comes from.
The real question is whether AI companies can keep scaling access without making abuse easier. That answer will shape pricing, API design, and trust for the next wave of model releases.
// Related Articles
- [IND]
OpenClaw should treat OpenAI Realtime as a paid API, not a subscripti…
- [IND]
Krea 2 brings 2-second image generation to teams
- [IND]
US model curbs should be lifted through security deals, not blanket b…
- [IND]
Meta’s moderation shift shows where AI cuts costs
- [IND]
Meta is replacing moderators with AI to cut costs
- [IND]
Meta’s AI moderation push is the wrong tradeoff