Technical Checklist: How to Prepare Your Website for AI Indexing
Complete technical checklist for preparing your website for AI bot indexing: robots.txt, llms.txt, JSON-LD, Schema.org, sitemap.xml, loading speed, full table of all AI bots and user agents.
Technical website preparation is the foundation of GEO optimization. Without it, expert content and external mentions lose effectiveness: AI systems simply will not be able to correctly read and index your website.
Complete table of AI bots and user agents
The first thing to know is which bots are trying to access your website. Here is the complete table of AI user agents as of March 2026.
| Bot | Company | AI product | User-Agent | Purpose |
|---|---|---|---|---|
| GPTBot | OpenAI | ChatGPT | GPTBot/1.0 | Indexing for ChatGPT with web search |
| ChatGPT-User | OpenAI | ChatGPT | ChatGPT-User | ChatGPT requests when browsing the web |
| OAI-SearchBot | OpenAI | ChatGPT Search | OAI-SearchBot/1.0 | OpenAI search index |
| ClaudeBot | Anthropic | Claude | ClaudeBot/1.0 | Indexing for Claude |
| PerplexityBot | Perplexity | Perplexity | PerplexityBot | Perplexity web search |
| Google-Extended | Gemini | Google-Extended | Data for Gemini training | |
| Googlebot | AI Overview, AI Mode | Googlebot | Unified bot for search and AI | |
| YandexBot | Yandex | Alice / Neurosearch | YandexBot/3.0 | Unified bot for search and AI |
| Bytespider | ByteDance | Doubao / TikTok AI | Bytespider | Indexing for ByteDance AI |
| CCBot | Common Crawl | Multiple LLMs | CCBot/2.0 | Data for model training |
| Amazonbot | Amazon | Alexa / Amazon AI | Amazonbot | Indexing for Amazon AI services |
| AppleBot-Extended | Apple | Apple Intelligence | AppleBot-Extended | Data for Apple AI |
| cohere-ai | Cohere | Command, Embed | cohere-ai | Indexing for Cohere AI |
| DeepSeekBot | DeepSeek | DeepSeek | DeepSeekBot | Indexing for DeepSeek |
| Meta-ExternalAgent | Meta | Meta AI | Meta-ExternalAgent/1.0 | Indexing for Meta AI |
1. robots.txt: access control for AI bots
What to check
Open your-site.com/robots.txt and check if there are any blocks for AI bots.
Problematic configurations
# BAD: blocks all AI bots
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
If you see such rules — AI systems cannot get up-to-date data from your website. ChatGPT, Claude, and Perplexity will rely exclusively on third-party sources.
Recommended configuration
# Allow AI bots access to public pages
User-agent: GPTBot
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /account/
User-agent: ClaudeBot
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /account/
User-agent: PerplexityBot
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /account/
User-agent: Google-Extended
Allow: /
User-agent: CCBot
Allow: /
What to block
/admin/,/api/,/account/— internal sections- Pages with user personal data
- Internal tools and dashboards
- Duplicate content (print versions, AMP pages without canonical)
What NOT to block
- Homepage, service and product pages
- Blog and expert articles
- FAQ sections
- About page
- Case studies and portfolio
2. llms.txt: instructions for AI
What it is
llms.txt is a file at the root of the website that provides AI systems with structured information about the company and the site. It is not a W3C standard, but a practice gaining traction in the AI community.
File format
# Company Name
> Brief company description in one sentence.
## About
Extended description: what it does, for whom, key advantages.
## Key pages
- [Product/Service](https://example.com/product): Description
- [Pricing](https://example.com/pricing): Description
- [About](https://example.com/about): Description
- [Blog](https://example.com/blog): Description
- [FAQ](https://example.com/faq): Description
## Contacts
- Website: https://example.com
- Email: info@example.com
- Phone: +1 (xxx) xxx-xx-xxPractical example
# GEO Scout
> Brand visibility monitoring platform across 9 AI providers.
## About
GEO Scout is a full-cycle GEO (Generative Engine Optimization) platform.
Daily monitoring of brand presence in ChatGPT, Claude, DeepSeek,
Gemini, Google AI Mode, Google AI Overview, Grok, Perplexity, and Yandex with Alice.
## Key pages
- [Home](https://geoscout.pro): Platform overview and features
- [Pricing](https://geoscout.pro/pricing): Plans and prices
- [Blog](https://geoscout.pro/blog): Expert articles on GEO
- [Ratings](https://geoscout.pro/ratings): Public AI brand visibility ratings3. JSON-LD / Schema.org: structured data
Structured data helps AI systems accurately understand page content. This is a critical factor for GEO — AI is more likely to cite data it can unambiguously interpret.
Priority markup types
| Type | Where to use | Impact on AI |
|---|---|---|
| Organization | Homepage, About page | AI gets basic brand information |
| Product | Product pages | AI can recommend specific products |
| Service | Service pages | AI understands what you offer |
| FAQPage | FAQ sections, articles with FAQ | AI extracts ready-made answers to questions |
| Article | Blog, expert articles | AI evaluates authorship and expertise |
| HowTo | Guides, instructions | AI cites step-by-step instructions |
| Review / AggregateRating | Product pages, reviews | AI conveys ratings and opinions |
| LocalBusiness | Contacts, branch listings | AI recommends in local queries |
| BreadcrumbList | All pages | AI understands site structure |
| SoftwareApplication | SaaS products | AI correctly classifies the product |
Example: Organization
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "Company Name",
"url": "https://example.com",
"logo": "https://example.com/logo.png",
"description": "Brief description with key facts",
"foundingDate": "2023",
"numberOfEmployees": {
"@type": "QuantitativeValue",
"value": 50
},
"sameAs": [
"https://t.me/company",
"https://vk.com/company"
]
}Example: FAQPage
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "Customer question?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Specific answer with facts and figures."
}
}
]
}4. Sitemap.xml: map for AI bots
Basic requirements
- File accessible at
your-site.com/sitemap.xml - Specified in robots.txt:
Sitemap: https://your-site.com/sitemap.xml - Contains all public pages that AI should see
<lastmod>tags are current (not static dates)- Size does not exceed 50,000 URLs (for large sites — sitemap index)
Page priority for AI
Not all pages are equally important for AI indexing. Prioritize:
- Homepage
- Service/product pages
- FAQ sections
- Expert articles and guides
- About page
- Case studies with data
- Pricing pages
Common mistakes
- Sitemap contains pages blocked in robots.txt
- Outdated
<lastmod>(AI systems with web search prefer fresh content) - Missing sitemap index for sites with 10,000+ pages
- Broken URLs in sitemap
5. Loading speed and Core Web Vitals
AI bots, like search crawlers, prefer fast websites. Additionally, some AI systems (Perplexity, Google AI) show page previews — a slow site creates a bad impression.
Target metrics
| Metric | Good | Acceptable | Poor |
|---|---|---|---|
| LCP (Largest Contentful Paint) | < 2.5 sec | 2.5-4.0 sec | > 4.0 sec |
| INP (Interaction to Next Paint) | < 200 ms | 200-500 ms | > 500 ms |
| CLS (Cumulative Layout Shift) | < 0.1 | 0.1-0.25 | > 0.25 |
| TTFB (Time to First Byte) | < 800 ms | 800-1800 ms | > 1800 ms |
Quick optimizations
- Enable compression (gzip/brotli)
- Configure static asset caching
- Optimize images (WebP/AVIF, lazy loading)
- Minify CSS and JavaScript
- Use a CDN
The website GEO audit in GEO Scout automatically checks loading speed and Core Web Vitals via the PageSpeed API.
6. Mobile optimization
70%+ of requests to AI assistants come from mobile devices (especially voice queries to Alice). If your mobile website performs poorly — an AI recommendation will lead to a negative experience.
Requirements
- Responsive design (not a separate m.site)
- Text readable without zooming
- Buttons and links with adequate touch targets (minimum 44x44 px)
- Forms adapted for mobile input
- No horizontal scrolling
7. Meta tags and content markup
Title and description
AI systems use meta tags for quick assessment of page content.
<title>Short, specific title with brand — up to 60 characters</title>
<meta name="description" content="Description with key facts and figures.
Specifics instead of generic phrases. Up to 160 characters.">Canonical URL
Mandatory for all pages. AI bots may index multiple versions of the same page (http/https, www/non-www, with/without parameters). Canonical specifies the primary version.
<link rel="canonical" href="https://example.com/page">Open Graph and Twitter Cards
AI systems that work with social data (Grok) consider OG tags. Fill in:
<meta property="og:title" content="Title">
<meta property="og:description" content="Description">
<meta property="og:image" content="Image URL">
<meta property="og:type" content="website">H1-H3 headings
Heading hierarchy is critical for AI — neural networks use it to understand content structure:
- H1 — one per page, contains the main topic
- H2 — main sections (AI often cites content by H2)
- H3 — subsections with specifics
Automated audit: what GEO Scout checks
GEO Scout includes an automatic website GEO audit that checks all technical factors from this checklist:
- robots.txt — accessibility for AI bots
- Schema.org — presence and correctness of JSON-LD markup
- PageSpeed — Core Web Vitals and loading speed
- Meta tags — title, description, canonical, OG
- Mobile adaptation — responsive and touch-friendly
- SSL — presence and validity of certificate
- Sitemap — presence and freshness
Audit results automatically flow into the Command Center, where AI prioritizes technical tasks by their impact on visibility in neural networks. Technical issues with high impact (for example, GPTBot blocking in robots.txt) receive maximum priority.
Checklist: website technical readiness for AI
robots.txt
- robots.txt file exists and is accessible
- GPTBot is not blocked
- ClaudeBot is not blocked
- PerplexityBot is not blocked
- Google-Extended is not blocked
- Internal sections are closed (/admin, /api, /account)
- Path to sitemap.xml is specified
llms.txt
- llms.txt file is created and placed at the site root
- Contains a brief company description
- Key pages with links are listed
- Length does not exceed 500 words
Structured data (JSON-LD)
- Organization — on homepage and About page
- Product / Service — on product and service pages
- FAQPage — on FAQ section and in articles
- Article — on expert articles (with author specified)
- BreadcrumbList — on all pages
- Markup is valid (verify via Google Rich Results Test)
Sitemap.xml
- File exists and is accessible
- Contains all public pages
-
<lastmod>dates are current - No broken URLs
- Specified in robots.txt
Speed and performance
- LCP < 2.5 seconds
- INP < 200 ms
- CLS < 0.1
- Compression enabled (gzip/brotli)
- Images optimized (WebP/AVIF)
- Static caching configured
Mobile optimization
- Responsive design
- Text readable without zoom
- Touch targets >= 44x44 px
- No horizontal scrolling
Meta tags and markup
- Title on every page (unique, up to 60 characters)
- Description on every page (with facts, up to 160 characters)
- Canonical URL on every page
- Open Graph tags filled in
- H1-H3 hierarchy correct (one H1 per page)
- SSL certificate valid
Частые вопросы
Which AI bots index websites?
What is llms.txt and does my website need it?
Should I allow AI bots to index my website?
Which Schema.org types are most important for AI?
Does website speed affect AI visibility?
How can I check if AI is indexing my website?
How is technical GEO different from technical SEO?
Related Articles
What Is GEO Optimization: Definition, Examples, Tools
Complete definition of GEO (Generative Engine Optimization): history of the term, how it works, practical examples, tools, and how it differs from SEO and AEO.
GEO Site Audit: What to Check So AI Cites You
Complete GEO site audit checklist: content structure, Schema.org, robots.txt for AI bots, E-E-A-T, Core Web Vitals. Step-by-step guide.
GEO Optimization Tool for Your Website: How to Get Into ChatGPT, Alice, and DeepSeek Responses
What is GEO (Generative Engine Optimization), how it differs from SEO, what factors affect getting into AI responses, and a step-by-step optimization guide.