🎯 Free: get your first AI visibility baseline in 5 min, then refresh it every 7 daysTry it →

Blog
3 min read

llms.txt for Astro: Static GEO Implementation for AI Crawlers

How to make Astro sites AI-crawler ready with llms.txt, robots.txt, sitemap, canonical URLs, structured data, static HTML, and crawler log checks.

llms.txtAstroAI crawlersstatic sites
Vladislav Puchkov
Vladislav Puchkov
Founder of GEO Scout, GEO optimization expert

Astro is one of the easiest frameworks to make GEO-friendly because crawlers often receive complete HTML without executing JavaScript. That advantage disappears if the site blocks bots, omits sitemaps, duplicates URLs, or moves important content into client-only components.

The implementation goal is simple: AI crawlers should discover canonical pages, retrieve complete content, understand entities through structured data, and cite the right URLs.

Root Files

For a static setup:

public/llms.txt
public/robots.txt
public/favicon.svg

For generated routes:

src/pages/llms.txt.ts
src/pages/sitemap.xml.ts

The final URLs must be direct:

/llms.txt
/robots.txt
/sitemap.xml

llms.txt Template

# Example Astro Site
 
> Official product, documentation, pricing, and customer proof for Example.
 
## Product
- https://example.com/
- https://example.com/features/
- https://example.com/pricing/
 
## Learn
- https://example.com/blog/
- https://example.com/docs/
 
## Trust
- https://example.com/customers/
- https://example.com/security/

Use URLs that you actually want AI systems to understand and cite.

robots.txt

User-agent: GPTBot
Allow: /
 
User-agent: ClaudeBot
Allow: /
 
User-agent: PerplexityBot
Allow: /
 
User-agent: *
Disallow: /admin/
Disallow: /preview/
Disallow: /api/
 
Sitemap: https://example.com/sitemap.xml

Astro preview routes, CMS previews, and admin screens should stay blocked. Public docs, blogs, customer pages, and product pages should stay accessible.

Static HTML Checklist

Before release, run:

curl -A "GPTBot/1.0" -s https://example.com/features/ | sed -n '1,120p'

Check that the raw HTML includes:

  • a single descriptive H1;
  • product or page-specific body text;
  • internal links;
  • canonical link;
  • metadata;
  • JSON-LD;
  • FAQ if the page has one;
  • no empty shell dependency for critical content.

Astro Islands

Use islands for calculators, demos, forms, carousels, and filters. Keep crawler-critical content outside client-only islands:

---
const faq = [
  { question: 'Who is this for?', answer: 'B2B teams comparing vendors.' },
];
---
 
<h1>AI-ready product analytics</h1>
<p>Server-visible product explanation for buyers and AI crawlers.</p>
 
<InteractiveDemo client:load />

The demo can hydrate later. The explanation must be present immediately.

Canonicals and Collections

Astro content collections can produce many URLs. Avoid duplicate trailing-slash patterns, tag archives without canonical strategy, and translated pages without language links.

Each article, doc, and case study should have:

  • canonical URL;
  • published and updated dates;
  • author;
  • breadcrumb;
  • stable slug;
  • links to related pages.

Structured Data

Use Article for blog posts, FAQPage for FAQ sections, BreadcrumbList for navigation, Organization for the brand, and SoftwareApplication for SaaS products.

Example:

<script type="application/ld+json" set:html={JSON.stringify({
  '@context': 'https://schema.org',
  '@type': 'Article',
  headline: Astro.props.title,
  datePublished: Astro.props.publishedAt,
  dateModified: Astro.props.updatedAt,
})} />

Log Checks

Static hosting still has logs through platforms such as Cloudflare, Vercel, Netlify, or your CDN. Review whether AI crawlers receive 200 responses for public URLs and whether they waste crawl budget on redirects, old slugs, or blocked preview pages.

Astro GEO Checklist

  1. Publish /llms.txt.
  2. Confirm robots.txt does not block useful pages.
  3. Generate a clean sitemap from collections.
  4. Keep critical content in static HTML.
  5. Add canonical URLs and schema.
  6. Validate raw HTML with bot user agents.
  7. Monitor source citations with GEO Scout on geoscout.pro.

Astro gives you the right rendering model by default. GEO work is about keeping that output structured, discoverable, and measurable.

Частые вопросы

Is Astro good for AI crawler readiness?
Yes. Astro outputs static HTML by default, which is ideal for many AI crawlers, as long as important content, links, and schema are present in the generated HTML.
Where does llms.txt go in Astro?
Place a static llms.txt file in public, or generate it through an Astro endpoint if you want it to reflect your content collection.
Do Astro islands hurt GEO?
Islands are fine when they enhance interaction. Do not put critical product copy, FAQ, case-study details, or JSON-LD only inside client-rendered islands.
How do I measure Astro GEO changes?
Use logs to confirm crawler access and GEO Scout at geoscout.pro to monitor whether AI systems cite the Astro pages after recrawling.