robots.txt for WordPress and AI Bots: Open the Right Pages Without Breaking GEO

WordPress is excellent for publishing, but its default information architecture can be messy for AI crawlers. A typical site may expose tag archives, author archives, RSS feeds, search pages, pagination, UTM parameters, preview URLs, REST API endpoints, plugin routes, WooCommerce paths, and old attachment pages. If everything is open, AI bots waste crawl budget on noise. If everything is blocked, the site may lose the pages that could become cited sources.

The goal is not to "allow AI" or "block AI" as a single decision. The goal is to expose the pages that explain the brand, product, expertise, and commercial offer while keeping technical clutter out of the crawl path.

What AI Bots Should See on WordPress

For GEO, the most valuable WordPress URLs are usually:

expert articles and guides;
service and product pages;
category pages with unique descriptions;
FAQ and knowledge base pages;
about, author, team, and contact pages;
comparison pages, case studies, tutorials, and reviews;
pricing pages, if the site has them.

AI systems do not browse like humans. They extract facts, entities, relationships, claims, and evidence. The cleaner the route to useful pages, the easier it is for the system to understand what the business does and when it should be recommended.

What to Block on a Typical WordPress Site

A conservative baseline often looks like this:

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /*?s=
Disallow: /*?replytocom=
Disallow: /*preview=true
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Allow: /wp-admin/admin-ajax.php
 
Sitemap: https://example.com/sitemap_index.xml

This is only a starting point. If category or tag pages contain useful editorial summaries, do not block them automatically. If WooCommerce category pages drive discovery, keep them open. If a headless WordPress setup renders the public site elsewhere, verify the actual URLs crawlers receive.

The most common mistake is treating every AI-related user agent as the same bot. They are not the same. OAI-SearchBot is connected to ChatGPT search. GPTBot is associated with model training. User-triggered agents may fetch a page because a person requested it. PerplexityBot and ClaudeBot can be part of retrieval and answer generation workflows.

If your policy allows AI search visibility but restricts training, use separate blocks:

User-agent: OAI-SearchBot
Allow: /
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /*preview=true
 
User-agent: GPTBot
Disallow: /
 
User-agent: PerplexityBot
Allow: /
Disallow: /wp-admin/
Disallow: /wp-login.php
 
User-agent: ClaudeBot
Allow: /
Disallow: /wp-admin/
Disallow: /wp-login.php

This is not a universal file for every business. It shows the policy pattern: search and training should not be collapsed into one broad rule.

WordPress Plugin Pitfalls

robots.txt may be generated by several layers:

a physical file in the web root;
WordPress virtual robots.txt;
an SEO plugin;
a security plugin;
a CDN or reverse proxy;
hosting-level rules.

That means a team can edit the expected file and still serve something else. After each change, open https://domain.com/robots.txt, check the HTTP status, review cache headers, and confirm the live content. If the site uses Cloudflare or another CDN, also check whether verified AI bots are challenged or blocked at the WAF layer.

WooCommerce Adds a Second Layer

For WordPress stores, the policy should be more granular. Do not block product pages, product categories, shipping, payment, return, and warranty pages. These URLs help AI answer questions such as "where can I buy it", "what is the delivery policy", "does the shop accept returns", and "which store is better".

A practical setup:

/product/ remains open;
/product-category/ remains open if descriptions are unique;
/cart/, /checkout/, and /my-account/ are blocked;
sorting and filter parameters are blocked or canonicalized;
the sitemap lists only useful indexable commercial URLs.

How to Validate the Result

Technical validation should include:

Open /robots.txt and confirm the live rules.
Check sitemap and canonical tags on priority pages.
Review server logs by user agent.
Confirm that CDN and WAF rules do not return 403 to important verified bots.
Compare AI visibility after 2-4 weeks.

The last step is essential. robots.txt is not the outcome. Better AI accessibility is the outcome. In GEO Scout, you can create prompt clusters around the brand, category, comparisons, and purchase intent, then track whether mentions and cited sources improve after WordPress access rules are cleaned up.

Mini Checklist

Articles, services, products, FAQ, and trust pages are open.
Admin, login, preview, internal search, cart, and private routes are blocked.
OAI-SearchBot and GPTBot are handled separately.
SEO plugins are not overwriting the intended policy.
Sitemap points to useful indexable URLs.
CDN and WAF rules do not block verified retrieval bots.
Changes are evaluated in logs and AI answers, not only in validators.

robots.txt for WordPress is a knowledge-access policy. If AI systems should understand and cite the site, give them a clean path to the pages that carry the facts.

Частые вопросы

Does a WordPress site need special robots.txt rules for AI bots?

Yes, if AI visibility matters. WordPress often creates technical URLs, archives, parameters, previews, and plugin-generated paths. robots.txt should keep useful public content open while reducing crawler noise.

Can I block GPTBot but stay visible in ChatGPT search?

Yes. The policy should separate user agents. GPTBot is associated with model training, while OAI-SearchBot is associated with ChatGPT search.

Which WordPress paths are usually safe to block?

Common candidates are /wp-admin/, login pages, internal search, preview URLs, reply parameters, carts, checkout, and account pages. Public articles, service pages, product pages, FAQ, and categories with useful content should usually remain open.

Can SEO plugins change robots.txt?

Yes. Yoast SEO, Rank Math, All in One SEO, security plugins, hosting panels, and CDN rules can generate or override robots.txt. Always verify the live file at /robots.txt.

How do I know whether AI bots can access my WordPress site?

Check server logs, CDN logs, WAF events, and AI visibility metrics. GEO Scout can show whether mentions, cited sources, and Domain Citation Rate change after access rules are updated.

robots.txt for WordPress and AI Bots: Open the Right Pages Without Breaking GEO

What AI Bots Should See on WordPress

What to Block on a Typical WordPress Site

Separate Search, Training, and User Actions

WordPress Plugin Pitfalls

WooCommerce Adds a Second Layer

How to Validate the Result

Mini Checklist

Частые вопросы

Related Articles

How to Configure CMS and Hosting for IndexNow and AI Bots

SSR, SSG, and ISR for AI Crawlers: Why JavaScript-Only Sites Lose Visibility

OAI-SearchBot, GPTBot, and robots.txt: How to Control AI Access to Your Site