Robots.txt Generator
Generate a robots.txt file with full control over AI crawlers — GPTBot, ClaudeBot, PerplexityBot, and more. Configure traditional search bots and custom rules. Download ready to deploy.
Traditional Crawlers
*GooglebotBingbotYandexBotBaiduspiderAI CrawlersGEO & AI Search
Toggle to block or allow each AI crawler. Blocking prevents AI training and citations from your content.
OpenAI / ChatGPT
Anthropic
Perplexity AI
Common Crawl
ByteDance / TikTok
Google (Gemini training)
Custom Rules
Sitemap URL
User-agent: *Allow: / User-agent: GPTBotAllow: / User-agent: ClaudeBotAllow: / User-agent: PerplexityBotAllow: / User-agent: CCBotAllow: / User-agent: BytespiderAllow: / User-agent: Google-ExtendedAllow: /
Upload robots.txt to your domain root so it is accessible at /robots.txt.
How to Use the Robots.txt Generator
A robots.txt file is one of the most fundamental technical SEO assets. Placed at your domain root, it communicates crawl preferences to every bot that visits your site — from Googlebot to the newest AI crawlers training large language models on your content.
In 2024 and 2025, the landscape of web crawlers expanded dramatically. Beyond traditional search bots, AI companies now operate their own crawlers to collect training data and power real-time AI search answers. Understanding and controlling these bots has become a critical part of modern SEO strategy.
Step-by-Step Guide
Configure traditional crawlers
Enable specific search engine bots (Googlebot, Bingbot, Yandex, Baidu) and set Allow, Disallow, or a custom path restriction for each.
Set AI crawler policies
Use the toggle cards in the AI Crawlers section to allow or block each AI bot individually. Green means the bot can access your site; red means it's blocked.
Add custom rules
For advanced use cases, add custom User-agent + Allow/Disallow + path combinations. Useful for blocking specific directories or allowing only certain bots to certain pages.
Add sitemap & download
Enter your sitemap URL so crawlers can find your content. Click Download to get robots.txt ready to upload to your domain root.
All AI Crawlers: Names, Companies & Recommendations
The following table lists every major AI crawler you may encounter in your server logs, along with their official user-agent strings, which company operates them, and whether blocking is recommended from an SEO and GEO (Generative Engine Optimization) perspective.
| Bot Name | Company | Purpose | Block? |
|---|---|---|---|
GPTBot | OpenAI | ChatGPT training data & browsing | Allow if you want ChatGPT citations; block for content protection |
ClaudeBot | Anthropic | Claude AI model training & improvement | Allow to improve Claude's knowledge of your site |
PerplexityBot | Perplexity AI | Training data & real-time answer generation | Allow for Perplexity citations; block for content protection |
CCBot | Common Crawl | Shared dataset used by many AI companies | Blocking impacts multiple AI systems simultaneously |
Bytespider | ByteDance / TikTok | TikTok search & AI training data | Block if you don't target TikTok's search ecosystem |
Google-Extended | Gemini AI and Google AI product training | Safe to block without affecting Google Search rankings |
Robots.txt Syntax Reference
# Block all crawlers from entire site User-agent: * Disallow: / # Allow Googlebot only User-agent: Googlebot Allow: / # Block GPTBot (OpenAI) User-agent: GPTBot Disallow: / # Block only /private/ directory User-agent: * Disallow: /private/ # Sitemap location Sitemap: https://example.com/sitemap.xml
Frequently Asked Questions
What is a robots.txt file?
A robots.txt file is a plain-text file placed at the root of your website (e.g. example.com/robots.txt). It follows the Robots Exclusion Protocol and tells web crawlers — including search engines and AI bots — which pages or sections of your site they are and aren't allowed to access. While it is not enforced by law, all reputable crawlers respect it.
What is GPTBot and should I block it?
GPTBot is OpenAI's web crawler used to collect training data for ChatGPT and other OpenAI models. If you block GPTBot, your content will not be used in future OpenAI model training. However, it also means ChatGPT may be less informed about your site. If you want citations from ChatGPT, you should allow GPTBot. If privacy or content protection is your priority, block it.
What is ClaudeBot and what does it crawl?
ClaudeBot is Anthropic's web crawler. It is used to improve Claude AI models. Blocking ClaudeBot prevents Anthropic from indexing your content for training. Like GPTBot, allowing ClaudeBot may increase the likelihood that Claude accurately represents your site when users ask about topics you cover.
Does blocking AI crawlers affect Google rankings?
No. Googlebot (which powers Google Search rankings) is a separate crawler from Google-Extended (used for Gemini AI training). Blocking Google-Extended in robots.txt will not affect your Google Search rankings. You can safely block Google-Extended if you don't want your content used for Gemini training without impacting your SEO.
What is the difference between GPTBot, CCBot, and PerplexityBot?
GPTBot (OpenAI) crawls the web for ChatGPT training data and real-time browsing. CCBot (Common Crawl) is used by many AI companies — including OpenAI and Hugging Face — as a shared training dataset. PerplexityBot is Perplexity AI's crawler used both for training and for real-time answer generation. Blocking CCBot can reduce exposure across many AI systems at once, since it supplies data to multiple companies.
Related Tools
Want a Full AI Search Audit?
Our GEO specialists will audit your site for AI crawler accessibility, robots.txt configuration, and overall visibility in ChatGPT, Perplexity, and Google AI Overviews.
Get a Free SEO Audit