AI systems are becoming primary discovery channels. ChatGPT, Perplexity, Claude, Google AI Overviews, and Gemini now answer hundreds of millions of queries every day. When they answer a question that your content could address, they make a retrieval decision: which sources should inform this answer, and which should be cited?
If your website isn't AI-readable, you're invisible to a growing share of your audience. And the gap between AI-readable and AI-invisible is widening fast.
This guide covers everything you need to do to make your website visible to AI systems - from the format they prefer, to the files they look for, to the analytics that tell you whether it's working.
The problem: your website wasn't built for AI
Your website was built for humans. It has navigation menus, footers, sidebars, tracking scripts, cookie banners, SVG icons, and deeply nested
AI agents don't see any of that. They see raw HTML - thousands of tokens of noise with your actual content buried somewhere in the middle. A typical CMS page consumes over 16,000 tokens in HTML. The same content in clean markdown takes roughly 3,000 tokens. That's an 80% waste problem.
Processing power is expensive. Every token an AI system processes costs compute. When AI agents are evaluating millions of pages to answer a query, a page that wastes 80% of their processing budget on navigation chrome and tracking scripts loses to a page that delivers pure content.
The five components of an AI-readable website
Making your website AI-readable isn't one thing. It's five things working together.
1. Markdown - the format AI systems actually want
Markdown is a lightweight text formatting language that's both human-readable and machine-readable. It uses simple characters to indicate structure: # for headings, ** for bold, - for lists. No wrapper divs, no CSS classes, no scripts.
AI systems prefer markdown because it's clean, structured, and token-efficient. The same content that takes 16,000 tokens in HTML takes roughly 3,000 in markdown - and every one of those tokens is actual content, not noise.
2. llms.txt - the site-level index
llms.txt is a markdown file at your website's root that gives AI systems a curated table of contents. It tells them what your site is about, what sections it has, and where to find the important content. Think of it as a directory at the front of a building.
Every website should have one. It's static, it lives referenced in the of every page, and it's the entry point for any AI agent trying to understand your site.
3. Per-page .md files - the gold standard
If llms.txt is the directory, per-page .md files are the rooms themselves. Each page on your site gets a clean markdown version - complete with YAML frontmatter containing the title, description, canonical URL, and a freshness timestamp.
This is where the real depth lives. An AI agent can read your llms.txt to understand your site structure, then fetch any individual page as clean markdown to get the full content. This is the gold standard for AI readability.
4. Unblocked crawlers - the robots.txt audit
None of this matters if your robots.txt is blocking AI crawlers. Many websites have broad disallow rules, security plugins, or CMS defaults that inadvertently shut the door on ClaudeBot, GPTBot, PerplexityBot, and other AI agents. A quick audit of your robots.txt is the easiest win in AI readability.
5. Content Signals - AI usage governance
Content Signals let you tell AI systems how they may use your content. Can they use it for training? For search results? For agentic tasks? This is the governance layer - more granular than robots.txt, and increasingly important as AI usage becomes a regulatory concern.
The numbers that matter
The data makes a compelling case for acting now:
- A typical page wastes 80% of its tokens on HTML noise that AI systems don't need
- ChatGPT converts at 15.9% and Perplexity at 10.5% - both higher than average organic search
- Only 5-15% of websites have implemented llms.txt as of early 2026
- Search engine query volume was predicted to drop 25% by 2026 due to AI chatbots, according to Gartner
The opportunity is wide open. Most of your competitors haven't done this yet.
The bigger picture: AEO
This is part of a fundamental shift from SEO to AEO - Answer Engine Optimisation. SEO success was positional: rank 3 vs rank 7. AEO success is binary. You're either part of the AI's synthesised answer or you're invisible. There's no page two in AI search.
The technical infrastructure covered in this guide - markdown, llms.txt, per-page .md files, unblocked crawlers, Content Signals - is the foundation that AEO content strategy sits on. Without it, even brilliant content can be invisible to AI systems.
How to measure progress
The biggest question in AEO right now is: how do I know if AI systems have actually indexed my content?
Your standard web analytics can't tell you. Google Analytics and HubSpot filter out bot traffic. You need dedicated AI crawl analytics that show you which bots are visiting, which pages they're hitting, how frequently, and whether activity is increasing or decreasing.
This is the feedback loop that makes everything else measurable. Without it, you're optimising blind.
Where to start
If you're starting from scratch, here's the priority order:
- Audit your robots.txt - make sure you're not blocking AI crawlers. This takes five minutes and is the highest-impact quick win.
- Create an llms.txt file - give AI systems a curated index of your site. Even a basic one is better than nothing.
- Set up per-page .md files - this is the gold standard. Every page on your site should have a clean markdown version available.
- Add discovery tags - put tags in your HTML so AI agents can find your markdown.
- Monitor AI bot traffic - track which bots are visiting, which pages they're hitting, and how activity trends over time.
Getmd.ai handles steps 2-5 automatically - it converts your pages to markdown on the fly, hosts your llms.txt, generates discovery tag code for your CMS, and provides AI crawl analytics. But whether you use a tool or build your own pipeline, the important thing is to start.
The websites that become AI-readable now will be the ones AI systems default to citing tomorrow. The ones that wait will wonder why they're invisible.
This is the first article in our series on making your website AI-readable. Read the full series: What is markdown? · What is llms.txt? · Per-page .md files · The robots.txt audit · Content structure for AI citation · Content Signals · How to track LLM indexing