FreeSEOTools.io
🤖
FreeGEO & AI Search

Robots.txt Generator

Generate a robots.txt file with full control over AI crawlers — GPTBot, ClaudeBot, PerplexityBot, and more. Configure traditional search bots and custom rules. Download ready to deploy.

Traditional Crawlers

Allow All (default)*
GooglebotGooglebot
BingbotBingbot
YandexYandexBot
BaiduBaiduspider

AI CrawlersGEO & AI Search

Toggle to block or allow each AI crawler. Blocking prevents AI training and citations from your content.

GPTBot

OpenAI / ChatGPT

Allowed
ClaudeBot

Anthropic

Allowed
PerplexityBot

Perplexity AI

Allowed
CCBot

Common Crawl

Allowed
Bytespider

ByteDance / TikTok

Allowed
Google-Extended

Google (Gemini training)

Allowed

Custom Rules

Sitemap URL

robots.txt
7 rules0 blocked20 linesrobots.txt
User-agent: *Allow: / User-agent: GPTBotAllow: / User-agent: ClaudeBotAllow: / User-agent: PerplexityBotAllow: / User-agent: CCBotAllow: / User-agent: BytespiderAllow: / User-agent: Google-ExtendedAllow: /

Upload robots.txt to your domain root so it is accessible at /robots.txt.

How to Use the Robots.txt Generator

A robots.txt file is one of the most fundamental technical SEO assets. Placed at your domain root, it communicates crawl preferences to every bot that visits your site — from Googlebot to the newest AI crawlers training large language models on your content.

In 2024 and 2025, the landscape of web crawlers expanded dramatically. Beyond traditional search bots, AI companies now operate their own crawlers to collect training data and power real-time AI search answers. Understanding and controlling these bots has become a critical part of modern SEO strategy.

Step-by-Step Guide

1

Configure traditional crawlers

Enable specific search engine bots (Googlebot, Bingbot, Yandex, Baidu) and set Allow, Disallow, or a custom path restriction for each.

2

Set AI crawler policies

Use the toggle cards in the AI Crawlers section to allow or block each AI bot individually. Green means the bot can access your site; red means it's blocked.

3

Add custom rules

For advanced use cases, add custom User-agent + Allow/Disallow + path combinations. Useful for blocking specific directories or allowing only certain bots to certain pages.

4

Add sitemap & download

Enter your sitemap URL so crawlers can find your content. Click Download to get robots.txt ready to upload to your domain root.

All AI Crawlers: Names, Companies & Recommendations

The following table lists every major AI crawler you may encounter in your server logs, along with their official user-agent strings, which company operates them, and whether blocking is recommended from an SEO and GEO (Generative Engine Optimization) perspective.

Bot NameCompanyPurposeBlock?
GPTBotOpenAIChatGPT training data & browsingAllow if you want ChatGPT citations; block for content protection
ClaudeBotAnthropicClaude AI model training & improvementAllow to improve Claude's knowledge of your site
PerplexityBotPerplexity AITraining data & real-time answer generationAllow for Perplexity citations; block for content protection
CCBotCommon CrawlShared dataset used by many AI companiesBlocking impacts multiple AI systems simultaneously
BytespiderByteDance / TikTokTikTok search & AI training dataBlock if you don't target TikTok's search ecosystem
Google-ExtendedGoogleGemini AI and Google AI product trainingSafe to block without affecting Google Search rankings

Robots.txt Syntax Reference

robots.txt— syntax reference
# Block all crawlers from entire site
User-agent: *
Disallow: /

# Allow Googlebot only
User-agent: Googlebot
Allow: /

# Block GPTBot (OpenAI)
User-agent: GPTBot
Disallow: /

# Block only /private/ directory
User-agent: *
Disallow: /private/

# Sitemap location
Sitemap: https://example.com/sitemap.xml

Frequently Asked Questions

What is a robots.txt file?

A robots.txt file is a plain-text file placed at the root of your website (e.g. example.com/robots.txt). It follows the Robots Exclusion Protocol and tells web crawlers — including search engines and AI bots — which pages or sections of your site they are and aren't allowed to access. While it is not enforced by law, all reputable crawlers respect it.

What is GPTBot and should I block it?

GPTBot is OpenAI's web crawler used to collect training data for ChatGPT and other OpenAI models. If you block GPTBot, your content will not be used in future OpenAI model training. However, it also means ChatGPT may be less informed about your site. If you want citations from ChatGPT, you should allow GPTBot. If privacy or content protection is your priority, block it.

What is ClaudeBot and what does it crawl?

ClaudeBot is Anthropic's web crawler. It is used to improve Claude AI models. Blocking ClaudeBot prevents Anthropic from indexing your content for training. Like GPTBot, allowing ClaudeBot may increase the likelihood that Claude accurately represents your site when users ask about topics you cover.

Does blocking AI crawlers affect Google rankings?

No. Googlebot (which powers Google Search rankings) is a separate crawler from Google-Extended (used for Gemini AI training). Blocking Google-Extended in robots.txt will not affect your Google Search rankings. You can safely block Google-Extended if you don't want your content used for Gemini training without impacting your SEO.

What is the difference between GPTBot, CCBot, and PerplexityBot?

GPTBot (OpenAI) crawls the web for ChatGPT training data and real-time browsing. CCBot (Common Crawl) is used by many AI companies — including OpenAI and Hugging Face — as a shared training dataset. PerplexityBot is Perplexity AI's crawler used both for training and for real-time answer generation. Blocking CCBot can reduce exposure across many AI systems at once, since it supplies data to multiple companies.

Related Tools

Want a Full AI Search Audit?

Our GEO specialists will audit your site for AI crawler accessibility, robots.txt configuration, and overall visibility in ChatGPT, Perplexity, and Google AI Overviews.

Get a Free SEO Audit