Question 1

How does a robots.txt rule tester work?

Accepted Answer

A robots.txt rule tester parses your robots.txt content and simulates how a crawler evaluates a given URL against the declared rules. It checks User-agent blocks in order of specificity — a specific bot's block takes precedence over the wildcard (*) block. Within a block, the most specific (longest) matching rule wins. Allow and Disallow rules of equal length resolve in favor of Allow.

Question 2

What is the correct order of precedence for robots.txt rules?

Accepted Answer

First, the tester checks for a User-agent block that matches the bot name exactly (case-insensitive). If a matching rule is found there, it applies and the wildcard (*) block is ignored for that bot. If no specific block exists, the wildcard (*) rules apply. Within any block, the longest (most specific) matching path wins. If an Allow and a Disallow rule match at the same length, Allow takes precedence — this is Google's behavior.

Question 3

Why does Googlebot ignore my Disallow rule?

Accepted Answer

The most common reason is a more specific Allow rule that overrides the Disallow. For example, 'Allow: /blog/' with 'Disallow: /' means /blog/ is allowed even though everything else is blocked. Also check that you haven't defined a separate 'User-agent: Googlebot' block with its own rules — that block takes full precedence over the wildcard block for Googlebot.

Question 4

Should I block AI bots like GPTBot and ClaudeBot?

Accepted Answer

Blocking AI training bots (GPTBot for OpenAI, ClaudeBot for Anthropic) prevents your content from being used in AI model training datasets. It does not affect your Google search rankings. To block them, add: 'User-agent: GPTBot' followed by 'Disallow: /' and repeat for ClaudeBot. These bots generally respect robots.txt, unlike scrapers. Decide based on whether you want your content used for AI training.

Question 5

What is the difference between Googlebot and Google-Extended?

Accepted Answer

Googlebot is Google's main search crawler — it indexes your pages for Google Search. Google-Extended is a separate user-agent used by Google to train its AI products (Gemini, Vertex AI). You can block Google-Extended to opt out of AI training without affecting your search rankings. Add 'User-agent: Google-Extended' with 'Disallow: /' to block it while keeping Googlebot allowed.

Robots.txt Rule Tester

How to Use the Robots.txt Rule Tester

Understanding the Results

Frequently Asked Questions

How does a robots.txt rule tester work?

What is the correct order of precedence for robots.txt rules?

Why does Googlebot ignore my Disallow rule?

Should I block AI bots like GPTBot and ClaudeBot?

What is the difference between Googlebot and Google-Extended?

Related Tools

Need a Full Technical SEO Audit?