Cloudflare’s policy turns crawlers into paid access

OraCore Editors

Back to home

[TOOLS] July 2, 202615 min readOraCore Editors

Cloudflare’s policy turns crawlers into paid access

Cloudflare’s new defaults block mixed-use AI crawlers and push publishers toward paid access controls.

Cloudflare

Share LinkedIn

Cloudflare’s new defaults block mixed-use AI crawlers and push publishers toward paid access controls.

I’ve been watching AI crawlers chew through publisher sites for a while now, and honestly, it’s been a mess. Search traffic, training traffic, agent traffic, all mixed together like nobody would ever have to sort it out later. That was always the fantasy. In practice, publishers got scraped, model vendors got free fuel, and the site owner got to eat the bandwidth bill. Great deal if you’re the crawler. Not so great if you’re the person paying for servers and trying to keep a newsroom alive.

What bothered me most was how sloppy the defaults were. If a crawler could claim “search” in one context and “AI” in another, the whole thing became a loophole factory. Site owners wanted discovery. They did not want their work turned into a free input pipeline for model training and agent behavior. Cloudflare’s move is interesting because it stops pretending those are the same thing. It forces the industry to separate intent, or at least admit when it won’t.

This is the first time I’ve seen a major infrastructure provider push the policy conversation down into the plumbing instead of leaving it at the robots.txt theater level. That matters. It changes what site owners can default to without becoming bot lawyers overnight.

I’m breaking down Sarah Perez’s TechCrunch write-up on Cloudflare’s July 1, 2026 announcement, because the actual mechanics are more useful than the headline. The key dates and product names are all there, and Cloudflare’s own framing makes the policy shift pretty explicit. I’m also linking the underlying companies where it helps: Cloudflare, You.com, and Ceramic.ai.

Cloudflare stopped treating every crawler like it had the same job

Get the latest AI news in your inbox

Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.

No spam. Unsubscribe at any time.

“Starting on September 15, 2026, Cloudflare’s default settings will block ‘mixed-use’ crawlers from any pages that host ads.”

What this actually means is simple: Cloudflare is changing the default behavior for a big chunk of the web it sits in front of. If a bot is trying to do search, agent work, and training all at once, Cloudflare now treats that as a problem instead of a convenience.

I like this move because it finally names the ugly middle ground. For years, bot vendors acted like “crawler” was one clean category. It isn’t. A search indexer, a training scraper, and an agent that needs live page access are not the same thing, even if they all look like HTTP requests from the edge.

The important part is the default. Site owners can still adjust settings, but Cloudflare is no longer making mixed-use access the path of least resistance. That’s the part that changes behavior at scale. Most people never touch the advanced settings. They inherit defaults and move on with their day.

Cloudflare says the change applies to new customers, new sites for existing customers, and all existing free customers. That is not a tiny rollout. That is Cloudflare trying to shift the baseline for a huge amount of public web traffic.

How to apply it: if I were running a publisher site, I’d start by inventorying which bots I actually want. Search discovery? Yes. AI training? Maybe not. Agent access to premium content? Only with a business reason. Then I’d check whether my CDN or bot management layer can separate those intents cleanly before a crawler hits the origin.

Make a bot policy by intent, not by user-agent string alone.
Separate search indexing from training and agent access in your controls.
Review default settings before the September 15 cutoff.

Cloudflare is betting publishers are done giving away content for free

Cloudflare’s announcement leans hard on a publisher pain point I’ve seen over and over: people want to be found, but they do not want to subsidize AI companies with free content extraction. That tension has been floating around for years, and most “solutions” were either too vague or too easy to ignore.

What Cloudflare is doing here is giving publishers a practical gate. If your site hosts ads, the default becomes stricter. That matters because ad-supported sites are often the ones most exposed to high-volume crawling. They need traffic, but they also need to protect the economics of the page itself.

The line that grabbed me was Cloudflare’s claim that website owners want content discoverable via search and often through AI services, but they also want protections against having intellectual property given away for free. That is the real business problem. Discovery is not the same as extraction. Visibility is not a license.

I ran into this exact issue on a content-heavy product site a while back. Search bots were fine. Some AI crawlers were not. The hard part wasn’t blocking everything. The hard part was deciding which access patterns were legitimate and which ones were just someone else building a product on top of our work without asking.

How to apply it: write a policy that says what each class of bot can do. If you’re a publisher, spell out whether you allow indexing, snippet generation, training, summarization, or agent retrieval. If you’re an AI vendor, stop hiding behind a single crawler identity and expose the actual purpose of the request.

Publish a bot access policy in plain language.
Decide whether ads-supported pages get different treatment than paid pages.
Track which bot behavior creates value and which behavior just consumes content.

Google’s split personality is the reason this got messy

Cloudflare’s announcement calls out the “world’s largest search engine,” which is obviously Google. The complaint is not subtle: Google has access to more information because it makes it hard for customers to stay discoverable without also being usable by AI systems.

Google pushed back by pointing to Google Extended, a bot that site owners can use to opt out of training and AI products like Gemini Apps and Vertex API without affecting Google Search inclusion. That distinction matters, because Google is basically saying, “We already gave you a separation tool.” Cloudflare is saying, “Good, now everybody else should do the same thing.”

This is where the policy gets real. If one major search player can separate search crawling from AI use, then “it’s too hard” stops sounding convincing. The industry has had the technical vocabulary for a while. What it lacked was pressure to make the separation the default.

And yes, Googlebot still crawls for Search, including AI features like AI Overviews and AI Mode. That’s exactly the sort of blending publishers have been complaining about. If search and AI features are entangled at the crawl layer, site owners are left guessing where their content ends up.

How to apply it: if you’re building an AI product, document your crawler purpose in a way a site owner can actually act on. If you’re a publisher, compare the promises from Google, Cloudflare, and whoever else is hitting your site. Do not accept “trust us” as policy.

Pay Per Crawl was the warm-up act, not the finish line

Cloudflare has already been pushing into this problem with its Pay Per Crawl marketplace, where websites can charge AI bots for scraping. The new announcement evolves that into “Pay Per Use,” which is a slightly less dumb name for a more useful idea.

What this actually means is that publishers should be able to charge when their content creates value, not just when it gets fetched. That’s a big shift. Fetching a page is one thing. Using that page to answer a query, generate a summary, or power an agent workflow is where the value transfer really happens.

I think this is where a lot of AI companies are going to get uncomfortable, because the old bargain was built on asymmetry. Crawl cheap, train cheap, monetize expensive. If the infrastructure layer starts letting publishers price access by use, that asymmetry gets harder to hide.

Cloudflare also says more than 50% of crawl traffic from AI crawlers is spent re-fetching unchanged pages. That is the kind of stat that makes people in infrastructure sit up straight. It means a lot of bot traffic is not even doing useful work. It’s just repeatedly asking for the same bytes.

How to apply it: if you run an AI system, measure how often you re-request unchanged content. If you’re a publisher, ask whether your bot traffic is actually producing distribution or just burning cycles. If you’re a platform vendor, build pricing around value creation instead of raw request volume.

Cloudflare is turning policy into a product surface

This part matters more than people think. Cloudflare is not just publishing a blog post and hoping the market behaves. It is turning crawler policy into an operational control, which means site owners can enforce it without hiring a compliance team.

That’s why this announcement feels different from the usual “please respect robots.txt” sermon. Cloudflare sits in the middle of a lot of web traffic. When it changes defaults, it can make a policy real in a way a white paper never could.

It also means Cloudflare gets to shape the economics. If a publisher opts in, they can get paid when content appears in You.com search results or when You.com accesses premium content. Cloudflare says other AI companies can customize the model for how they work. That sounds flexible, but it also means Cloudflare is effectively building the negotiation layer between content owners and AI vendors.

I’ve seen enough platform shifts to know the pattern: once the edge provider becomes the gatekeeper, the policy becomes the product. That can be good for publishers, but it also means you should read the fine print like your revenue depends on it, because it does.

How to apply it: if you’re a publisher, treat bot monetization like an actual revenue channel and test it. If you’re an AI company, decide whether you want to be the vendor that pays cleanly or the one that keeps getting blocked by default settings.

The real test is whether mixed-use crawlers split up

Cloudflare says it hopes the default changes will encourage mixed-use crawlers to separate search from agent use and training. That is the real goal. Not punishment. Separation.

That sounds boring, but boring is good here. The web has been running on fuzzy crawler intent for too long. If a bot wants search access, fine. If it wants training data, say that. If it wants to act as an agent on behalf of a user, that should be visible too. One crawler pretending to be all three is exactly the kind of mess that creates distrust.

There’s also a practical benefit for AI vendors who are honest about their intent. Cloudflare explicitly says the new tools and partnerships benefit AI companies that have bots with clear and transparent intent. In other words: if your crawler is cleanly labeled, you may get access. If it’s a shape-shifter, expect friction.

That’s the policy lesson I’d take from this whole thing. The web is moving toward explicit access contracts, not implied permission. If your product depends on content, you need a story for why that content should be accessible, how it gets used, and what the publisher gets back.

How to apply it: audit your crawler stack. Split one bot into multiple bots if needed. Give each one a narrow purpose, clear documentation, and a separate policy posture. If you can’t explain the bot in one sentence, the publisher probably won’t trust it either.

The template you can copy

# Publisher AI crawler policy template

## Bot access policy

We allow search indexing for discovery.
We do not allow training use unless explicitly licensed.
We allow agent access only when it is required for a user-requested action.
We may charge for premium content access, summarization, or reuse.

## Allowed bot intents

- Search indexing
- Snippet generation for search
- Licensed training access
- User-requested agent retrieval

## Disallowed bot intents

- Hidden training crawls
- Mixed-use crawlers that do not disclose purpose
- Re-fetching unchanged pages without a valid reason
- Bulk extraction of premium content without permission

## Bot identification requirements

Each crawler must disclose:
- Bot name
- Company name
- Purpose: search, training, agent, or other
- Contact email
- Verification method
- Rate limits requested

## Access rules

### Public pages
Public pages may be indexed for search.
Public pages may be fetched by declared AI bots only if purpose is disclosed.

### Premium pages
Premium pages are blocked by default.
Premium pages may be licensed for training or agent access under written agreement.

### Ad-supported pages
Ad-supported pages are blocked from mixed-use crawlers by default.
Access may be granted only if the crawler separates search from training and agent use.

## Commercial terms

If content is used to create value in search results, summaries, or agent responses, access may require payment.
Pricing can be based on:
- Requests
- Pages accessed
- Unique content used
- Value created

## Re-fetch policy

Bots must not repeatedly fetch unchanged pages unless:
- The page changed
- The bot is validating freshness
- The bot is operating under a paid agreement

## Enforcement

Blocked bots may be denied at the edge.
Repeated violations may trigger rate limiting or permanent blocking.

## Notes for AI vendors

If your crawler serves multiple purposes, split it into separate bots.
If your intent is not clear, expect default blocking.
If you want access, disclose exactly how the content will be used.

This is the part I’d actually copy into a real site policy, then trim for your own business. The point is not perfect legal language. The point is to stop pretending every crawler is the same and force intent into the open.

If I were implementing this tomorrow, I’d start with the simplest possible split: search bots, training bots, and agent bots. Then I’d decide which content classes each one can touch, whether payment is required, and what counts as a violation. That’s enough structure to stop the chaos without turning your ops team into a courtroom.

Source: TechCrunch. This piece is my breakdown of Sarah Perez’s reporting and Cloudflare’s announcement, plus my own take on how developers and publishers can use the policy shift in practice.

// Related Articles

Cloudflare’s policy turns crawlers into paid access

Cloudflare stopped treating every crawler like it had the same job

Get the latest AI news in your inbox

Cloudflare is betting publishers are done giving away content for free

Google’s split personality is the reason this got messy

Pay Per Crawl was the warm-up act, not the finish line

Cloudflare is turning policy into a product surface

The real test is whether mixed-use crawlers split up

The template you can copy

Mistral OCR 4 Prices Document AI for Enterprise

Visual Studio turns Copilot into an IDE workflow

Databricks adds AI Gateway inference tables for served models

BASIC09 gets a new LLVM-based compiler

9 Cursor alternatives that beat lock-in

AI视频生成工具的胜负手，已经不是单次生成而是全流程生产