Robots.txt Analyzer
Analyze robots.txt for contradictory rules, dangerous blocks, and SEO issues with URL simulation
Enter your robots.txt content to analyze for issues and SEO best practices
Test if a specific URL is allowed or blocked for a user-agent
What is the Robots.txt Analyzer?
The Robots.txt Analyzer is a client-side tool that parses your robots.txt file and identifies issues that could affect search engine crawling and indexing. It detects contradictory rules, dangerous blocks that may hide your content from search engines, missing sitemap declarations, and structural problems — then provides a quality score with actionable recommendations.
How to Use
- Paste your robots.txt content into the input field
- Click "Analyze" or wait for automatic processing
- Review findings with severity ratings (Critical, High, Medium, Low)
- Use the URL simulator to test if specific paths are allowed or blocked
- Export a full report in JSON or Markdown format
Example: Problematic robots.txt
This robots.txt has several issues that the analyzer will detect:
User-agent: *
Disallow: /
Allow: /public/
User-agent: Googlebot
Disallow: /blog/
Allow: /blog/
Sitemap: sitemap.xml What Issues Are Detected?
- Entire site blocked — Disallow: / prevents all crawling for the affected user-agent, removing your site from search results
- Contradictory rules — Same path both allowed and disallowed within a user-agent group creates ambiguity
- Important paths blocked — Blocking /sitemap.xml, /api, /feed, or /rss can hurt functionality and SEO
- Major bots blocked — Specifically blocking Googlebot, Bingbot, or other major crawlers from content paths reduces visibility
- Invalid sitemap URL — Sitemap directives must use absolute URLs starting with http:// or https://
- Missing sitemap — No Sitemap directive means crawlers may not discover your XML sitemap automatically
- Excessive crawl-delay — Values above 10 seconds severely limit crawl rate and indexing speed
URL Simulation: Longest-Match-Wins
The URL simulator uses the same matching strategy as Googlebot. For a given URL and user-agent, it finds all matching rules and applies the one with the longest path. On equal-length matches, Allow takes priority over Disallow. If no rules match, the URL is allowed by default.
Robots.txt Best Practices
- Always include a Sitemap directive pointing to your XML sitemap (absolute URL)
- Never block your entire site unless it is a staging or private environment
- Use specific paths rather than broad blocks to control crawling
- Avoid contradictory rules — pick either Allow or Disallow for each path
- Keep crawl-delay low (or remove it) — Googlebot ignores it anyway
- Test important URLs with the simulator to verify they are accessible to crawlers
- Use User-agent: * as a fallback and add specific bot rules only when needed
Privacy and Security
All analysis happens entirely in your browser using JavaScript. Your robots.txt content — which may reveal internal URL structures and site architecture — is never transmitted to any server. No data is stored, logged, or shared.
Frequently Asked Questions
What does the Robots.txt Analyzer detect?
The analyzer parses user-agent groups, detects contradictory rules (same path both allowed and disallowed), simulates URL allow/block for specific crawlers, identifies sitemap declarations, flags dangerous blocks (blocking entire site, important paths, or major search engine bots), and checks for structural issues like missing sitemaps or excessive crawl-delay values.
What are contradictory rules in robots.txt?
Contradictory rules occur when the same URL path is both allowed and disallowed for the same user-agent. For example, having both 'Allow: /blog' and 'Disallow: /blog' in the same group. Most crawlers resolve this using longest-match-wins, but it creates ambiguity and should be cleaned up.
Why is 'Disallow: /' flagged as critical?
Disallow: / blocks access to the entire website for the specified user-agent. When applied to all crawlers (User-agent: *) or major search engines, it prevents indexing of your entire site, effectively removing it from search results. This should only be used on staging or private environments.
How does URL allow/block simulation work?
The analyzer uses the longest-match-wins strategy (the same approach used by Googlebot). For a given URL and user-agent, it checks all matching rules and the one with the longest path prefix wins. On equal-length matches, Allow takes priority over Disallow. If no rules match, the URL is allowed by default.
Is my robots.txt sent to any server?
No. All analysis happens entirely in your browser using JavaScript. Your robots.txt content — which may reveal internal URL structures and website architecture — never leaves your device. No data is stored, logged, or transmitted.
Why should I include a Sitemap directive?
The Sitemap directive tells crawlers where to find your XML sitemap, helping them discover pages more efficiently. While crawlers may find your sitemap through other means (like Google Search Console), including it in robots.txt provides an additional discovery mechanism that works across all compliant crawlers.
What format should the input be?
Paste your robots.txt file content exactly as it appears on your server (typically at /robots.txt). The analyzer expects standard robots.txt syntax with User-agent, Allow, Disallow, Crawl-delay, and Sitemap directives. Comments (lines starting with #) are preserved but ignored during analysis.
What is crawl-delay and why is a high value flagged?
Crawl-delay tells bots how many seconds to wait between requests. Values above 10 seconds are flagged because they severely limit how quickly search engines can crawl your site, potentially causing pages to be indexed slowly or not at all. Note that Googlebot ignores crawl-delay — use Google Search Console's crawl rate settings instead.