robots.txt for WordPress and AI Bots: Open the Right Pages Without Breaking GEO
How to configure robots.txt in WordPress for OAI-SearchBot, GPTBot, ClaudeBot, PerplexityBot, and other AI crawlers. What to allow, what to block, and how to measure AI visibility impact.
WordPress is excellent for publishing, but its default information architecture can be messy for AI crawlers. A typical site may expose tag archives, author archives, RSS feeds, search pages, pagination, UTM parameters, preview URLs, REST API endpoints, plugin routes, WooCommerce paths, and old attachment pages. If everything is open, AI bots waste crawl budget on noise. If everything is blocked, the site may lose the pages that could become cited sources.
The goal is not to "allow AI" or "block AI" as a single decision. The goal is to expose the pages that explain the brand, product, expertise, and commercial offer while keeping technical clutter out of the crawl path.
What AI Bots Should See on WordPress
For GEO, the most valuable WordPress URLs are usually:
- expert articles and guides;
- service and product pages;
- category pages with unique descriptions;
- FAQ and knowledge base pages;
- about, author, team, and contact pages;
- comparison pages, case studies, tutorials, and reviews;
- pricing pages, if the site has them.
AI systems do not browse like humans. They extract facts, entities, relationships, claims, and evidence. The cleaner the route to useful pages, the easier it is for the system to understand what the business does and when it should be recommended.
What to Block on a Typical WordPress Site
A conservative baseline often looks like this:
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /*?s=
Disallow: /*?replytocom=
Disallow: /*preview=true
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://example.com/sitemap_index.xmlThis is only a starting point. If category or tag pages contain useful editorial summaries, do not block them automatically. If WooCommerce category pages drive discovery, keep them open. If a headless WordPress setup renders the public site elsewhere, verify the actual URLs crawlers receive.
Separate Search, Training, and User Actions
The most common mistake is treating every AI-related user agent as the same bot. They are not the same. OAI-SearchBot is connected to ChatGPT search. GPTBot is associated with model training. User-triggered agents may fetch a page because a person requested it. PerplexityBot and ClaudeBot can be part of retrieval and answer generation workflows.
If your policy allows AI search visibility but restricts training, use separate blocks:
User-agent: OAI-SearchBot
Allow: /
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /*preview=true
User-agent: GPTBot
Disallow: /
User-agent: PerplexityBot
Allow: /
Disallow: /wp-admin/
Disallow: /wp-login.php
User-agent: ClaudeBot
Allow: /
Disallow: /wp-admin/
Disallow: /wp-login.phpThis is not a universal file for every business. It shows the policy pattern: search and training should not be collapsed into one broad rule.
WordPress Plugin Pitfalls
robots.txt may be generated by several layers:
- a physical file in the web root;
- WordPress virtual robots.txt;
- an SEO plugin;
- a security plugin;
- a CDN or reverse proxy;
- hosting-level rules.
That means a team can edit the expected file and still serve something else. After each change, open https://domain.com/robots.txt, check the HTTP status, review cache headers, and confirm the live content. If the site uses Cloudflare or another CDN, also check whether verified AI bots are challenged or blocked at the WAF layer.
WooCommerce Adds a Second Layer
For WordPress stores, the policy should be more granular. Do not block product pages, product categories, shipping, payment, return, and warranty pages. These URLs help AI answer questions such as "where can I buy it", "what is the delivery policy", "does the shop accept returns", and "which store is better".
A practical setup:
/product/remains open;/product-category/remains open if descriptions are unique;/cart/,/checkout/, and/my-account/are blocked;- sorting and filter parameters are blocked or canonicalized;
- the sitemap lists only useful indexable commercial URLs.
How to Validate the Result
Technical validation should include:
- Open
/robots.txtand confirm the live rules. - Check sitemap and canonical tags on priority pages.
- Review server logs by user agent.
- Confirm that CDN and WAF rules do not return 403 to important verified bots.
- Compare AI visibility after 2-4 weeks.
The last step is essential. robots.txt is not the outcome. Better AI accessibility is the outcome. In GEO Scout, you can create prompt clusters around the brand, category, comparisons, and purchase intent, then track whether mentions and cited sources improve after WordPress access rules are cleaned up.
Mini Checklist
- Articles, services, products, FAQ, and trust pages are open.
- Admin, login, preview, internal search, cart, and private routes are blocked.
- OAI-SearchBot and GPTBot are handled separately.
- SEO plugins are not overwriting the intended policy.
- Sitemap points to useful indexable URLs.
- CDN and WAF rules do not block verified retrieval bots.
- Changes are evaluated in logs and AI answers, not only in validators.
robots.txt for WordPress is a knowledge-access policy. If AI systems should understand and cite the site, give them a clean path to the pages that carry the facts.
Частые вопросы
Does a WordPress site need special robots.txt rules for AI bots?
Can I block GPTBot but stay visible in ChatGPT search?
Which WordPress paths are usually safe to block?
Can SEO plugins change robots.txt?
How do I know whether AI bots can access my WordPress site?
Related Articles
How to Configure CMS and Hosting for IndexNow and AI Bots
A technical guide to preparing your CMS and hosting for IndexNow and AI crawlers: robots.txt, sitemap, logs, caching, WAF, SSR, CDN, and monitoring.
SSR, SSG, and ISR for AI Crawlers: Why JavaScript-Only Sites Lose Visibility
Why many AI crawlers do not execute JavaScript and how SSR, SSG, and ISR make public content visible to ChatGPT, Claude, Perplexity, and Google AI.
OAI-SearchBot, GPTBot, and robots.txt: How to Control AI Access to Your Site
What OAI-SearchBot, GPTBot, and ChatGPT-User actually do, how to structure robots.txt rules without confusion, and how not to accidentally block ChatGPT search inclusion.