llms.txt for Next.js: Implementation Checklist for AI Crawler Readiness
How to add llms.txt, robots.txt, sitemap, canonical tags, structured data, and server-rendered content to a Next.js site for AI crawlers.
Next.js can be excellent for AI crawler readiness, but only if public content is server-visible. A crawler should be able to request your homepage, feature pages, docs, case studies, and blog posts and receive useful HTML without running a browser.
GEO Scout helps teams treat this as an implementation workflow, not a theory. After shipping the checklist below, track prompts and sources in geoscout.pro to see whether AI systems cite the pages you exposed.
File Placement
For static content, use:
public/llms.txt
public/robots.txt
public/sitemap.xmlFor generated content in the App Router, use route handlers:
app/llms.txt/route.ts
app/robots.txt/route.ts
app/sitemap.tsThe root URLs should resolve directly:
https://example.com/llms.txt
https://example.com/robots.txt
https://example.com/sitemap.xmlMinimal llms.txt
Keep it short and operational:
# Example SaaS
> Product documentation, pricing, case studies, and implementation resources for Example SaaS.
## Core pages
- https://example.com/
- https://example.com/features
- https://example.com/pricing
- https://example.com/customers
## Documentation
- https://example.com/docs/getting-started
- https://example.com/docs/api
## Policies
- https://example.com/security
- https://example.com/privacyDo not turn llms.txt into a keyword dump. It should help crawlers find canonical, useful public information.
robots.txt
Allow the AI crawlers you want to access public content, block private or duplicate paths, and include the sitemap:
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: *
Disallow: /app/
Disallow: /api/
Disallow: /checkout/
Sitemap: https://example.com/sitemap.xmlBlocking /api/ is usually correct. Blocking /docs/, /blog/, or /customers/ is usually harmful for GEO.
Sitemap
In app/sitemap.ts, include all canonical public pages:
import type { MetadataRoute } from 'next'
export default function sitemap(): MetadataRoute.Sitemap {
return [
{
url: 'https://example.com/',
lastModified: new Date('2026-04-24'),
changeFrequency: 'weekly',
priority: 1,
},
{
url: 'https://example.com/features',
lastModified: new Date('2026-04-24'),
changeFrequency: 'weekly',
priority: 0.8,
},
]
}Use stable canonical URLs. If localized pages exist, connect them with alternates.languages in metadata and hreflang in sitemaps where your stack supports it.
Rendering Checklist
- Use SSG or ISR for blog, docs, feature pages, pricing, comparisons, and case studies.
- Use SSR only where public content depends on request-time data.
- Keep private dashboards behind auth and out of crawler paths.
- Render H1, body copy, FAQ, tables, and JSON-LD before hydration.
- Avoid hiding critical content behind tabs that are empty in the initial HTML.
- Validate with
curl -A "GPTBot/1.0" https://example.com/features.
Structured Data
For SaaS pages, add Organization, SoftwareApplication, FAQPage, BreadcrumbList, Article, and Product where relevant. Put JSON-LD in server-rendered components or metadata helpers so crawlers receive it immediately.
Logs
Use Vercel logs, CDN logs, or server logs to verify bot access:
GPTBot
ClaudeBot
PerplexityBot
Google-Extended
Googlebot
BingbotLook for status codes, blocked paths, excessive redirects, and pages returning thin HTML.
Release Checklist
- Publish
/llms.txt. - Confirm
robots.txtallows the AI crawlers you want. - Include sitemap links and canonical URLs.
- Render public pages with SSG, ISR, or SSR.
- Add server-visible schema.
- Inspect raw HTML with bot user agents.
- Monitor AI mentions and source citations in GEO Scout.
The goal is not to create a special website for bots. The goal is to make your best public information easy to retrieve, understand, and cite.
Частые вопросы
Where should llms.txt live in a Next.js app?
Should llms.txt replace robots.txt?
Do AI crawlers need SSR in Next.js?
Can GEO Scout measure the impact?
Related Articles
AI Crawler Readiness Checklist: Is Your Site Ready for GPTBot, OAI-SearchBot, and Others?
A technical checklist for AI crawler readiness covering robots.txt, sitemaps, SSR, status codes, logs, CDN rules, rate limits, structured data, and unblocked content.
IndexNow for Next.js: Faster Discovery for AI Search and Bing Copilot
How to implement IndexNow in Next.js for published and updated pages, including API routes, keys, sitemaps, canonical URLs, and GEO measurement.
Schema for SaaS Feature Pages: Structured Data for AI Answers
A technical checklist for SaaS feature page schema with SoftwareApplication, FAQPage, BreadcrumbList, Organization, Product signals, canonical URLs, and server rendering.