Updated SEO

Robots.txt Tester

Paste a website URL, pick a bot, and test whether a page path is allowed or blocked. See the exact group and rule that matched.

Allowed vs Blocked Matched rule Wildcards Batch testing

Robots Rule Checker

Enter a site, choose a user-agent, and test one or many paths. The preview updates after robots.txt is fetched.

Paste a website URL and click “Fetch robots.txt”. The tool will auto-test after robots.txt loads.

Detected Directives

Sitemaps

Crawl-delay

Groups

Crawl-delay and wildcard support varies by crawler. Always verify behavior for the bot you care about.
Batch testing uses the robots.txt currently loaded from the website URL in the Single Test tab.
Paste a list of paths (or full URLs), then run the batch test.

Robots.txt Source

No robots.txt loaded yet.

Robots.txt Preview

How This Tester Chooses a Rule

  1. Fetch /robots.txt from the website URL you entered.
  2. Parse it into user-agent groups (each group has one or more user-agent names and rules).
  3. Pick the best-matching group for your user-agent (most specific match beats *).
  4. Find all rules that match your path, then choose the most specific rule (longest match).
  5. If there’s a tie, Allow wins over Disallow.
Crawler behavior varies. This tool follows common interpretation with wildcard support when enabled.

What If My Result Looks Wrong?

  • Check whether you tested the correct user-agent (e.g., Googlebot vs Googlebot-Image).
  • Confirm whether your rule uses * and $ and whether wildcard mode is enabled.
  • Make sure you tested the right path format (leading /).
  • Remember: robots.txt controls crawling, not indexing. “Allowed” does not mean “indexed.”

What Is Robots.txt and What Does It Control

robots.txt is a small text file located at the root of a website (for example, https://example.com/robots.txt) that provides crawl instructions for automated agents such as search engine bots. When a crawler arrives at your site, it typically requests robots.txt first, then decides which parts of the site it is permitted to fetch. In practice, robots.txt is one of the fastest ways to shape crawl behavior and reduce unnecessary bot traffic to areas of your site that don’t need to be crawled.

It’s important to understand what robots.txt can and cannot do. It’s primarily a crawling control, not a guaranteed indexing control. If your goal is to prevent a URL from appearing in search results, robots.txt alone is not always enough. A URL can still sometimes appear as a “URL-only” result if it is linked from elsewhere, even when crawling is restricted. If your goal is “do not show this page in Google,” you typically want noindex (and you usually need to allow crawling so the crawler can see that directive).

Why Use a Robots.txt Tester

robots.txt looks simple, but real-world behavior can be confusing. Multiple user-agent groups can exist, wildcards can change which paths match, and a single missing slash can flip an “allowed” page into “blocked.” A Robots.txt Tester helps you turn rules into outcomes. Instead of guessing what Disallow: /private/ does for a specific bot, you can test exact URLs and see the decision and the matched rule.

This tool is built for audits, launches, migrations, and quick troubleshooting. If a new section isn’t being crawled, a tester helps you confirm whether bots are blocked. If you’re removing a block, it helps you validate that Allow rules actually override Disallow where you intended.

How Robots.txt Is Structured

robots.txt is usually organized into one or more groups. Each group starts with one or more User-agent: lines followed by directives that apply to those agents. The most common directives are:

  • Disallow: paths the bot should not crawl.
  • Allow: exceptions that permit crawling inside an otherwise disallowed area.
  • Sitemap: sitemap URLs that help bots discover URLs efficiently.
  • Crawl-delay: a request to slow crawling frequency (not supported by all crawlers).

Groups matter because different bots may follow different rules. It’s normal to allow Googlebot to crawl areas that you block for aggressive third-party scrapers. It’s also common to have a catch-all group for User-agent: * plus special cases for individual crawlers.

What “Most Specific Match” Means

When multiple rules match a path, crawlers typically choose the most specific matching rule. The easiest practical model is: the longest matching pattern wins. For example, suppose your robots.txt includes:

  • Disallow: /
  • Allow: /public/

A URL like /public/page matches both patterns, but /public/ is more specific than /, so it wins and the path is allowed. This is why a tester should show the matched rule and its specificity: it turns “robots logic” into a clear explanation you can audit.

Wildcards and End Markers: * and $

Many teams use wildcard matching to control crawling of URL patterns like parameters or file types. Two common symbols appear in robots rules:

  • * matches any sequence of characters.
  • $ means “match the end of the URL path.”

A practical example is blocking internal search parameters: Disallow: /*?s= or blocking specific file types: Disallow: /*.pdf$. Support can vary by crawler, but many major crawlers interpret these symbols in a similar way. This tool lets you toggle wildcard support so you can see how decisions change if a bot treats these as plain prefixes.

Does Robots.txt Affect SEO Rankings

robots.txt can influence SEO indirectly because it affects what bots can crawl and how efficiently they can allocate crawl budget. Blocking duplicate or low-value pages can help crawlers focus on important pages. But blocking critical resources or sections by mistake can reduce crawl coverage and lead to slower discovery or incomplete indexing.

A common high-impact mistake is blocking JavaScript or CSS resources that Google needs to render pages properly. Another is blocking pagination or faceted navigation incorrectly, which can remove product discovery pathways. A tester helps you validate the crawl rules you intend, especially for patterns that include wildcards.

Robots.txt vs Noindex vs Canonical

Robots.txt

Robots.txt tells crawlers what they should fetch. It’s best for crawl efficiency and preventing bot overload. It is not a reliable indexing removal tool.

Noindex

Noindex is a meta robots directive (or header) that tells crawlers not to include a page in search results. It requires the crawler to access the page, which means robots.txt should generally allow crawling if you depend on noindex.

Canonical

Canonical is a hint to consolidate indexing signals across duplicate or near-duplicate pages. It does not block crawling, and it does not guarantee removal, but it helps search engines choose which version to treat as primary.

Step-by-Step: How to Test Robots Rules on Any Site

  1. Paste the site URL (homepage is fine). The tool fetches /robots.txt.
  2. Pick the user-agent you care about. Different bots may have different groups.
  3. Enter a path or full URL you want to test.
  4. Run the test and review the matched group and matched rule.
  5. Switch to Mobile/Desktop? Not needed here. Robots rules are the same; only the bot and path matter.
  6. Batch test a list of important pages before a release or migration.

Common Robots.txt Mistakes and Fixes

1) Blocking the entire site accidentally

Disallow: / under User-agent: * blocks all crawling for most bots. If this was intended only for staging, ensure production robots.txt is correct after deployment. Use this tester to verify critical pages are allowed.

2) Allow rules that don’t override as expected

Allow only helps when it is more specific than a Disallow that also matches. If your Allow is shorter or less specific than the Disallow, it may not win. The “Matched Rule” output in this tool helps you see which directive actually decides the outcome.

3) Misunderstanding query strings

Some robots patterns target parameters. This tool can include query strings in the test or ignore them so you can compare outcomes. If you’re blocking URL parameters, make sure you are matching the path format your crawler uses.

4) Confusing crawling blocks with indexing blocks

If your goal is to remove URLs from results, robots.txt is not the primary tool. Use noindex or other removal methods. robots.txt is best for “don’t crawl.”

When to Use Batch Testing

Batch testing is ideal before a launch, redesign, or migration. Paste your most important URLs: homepage, category pages, product pages, blog posts, and key tools. Then verify they are allowed for your primary crawler. This helps prevent “we shipped a robots block” incidents, which are surprisingly common.

Batch testing is also useful for auditing blocks. If you suspect bots are wasting crawl budget on low-value URLs, you can test sample patterns and confirm your Disallow rules behave the way you think they do.

How to Read the Results

Decision

“Allowed” means the rules do not block crawling for the bot you selected. “Blocked” means the most specific matching rule is Disallow.

Matched Group

This shows which user-agent group was used. If the bot did not match a specific group, the * group may be used.

Matched Rule

This is the exact directive that decided the result. If nothing matches, the default is usually allowed.

Specificity

This tool uses a practical specificity measure: the longest pattern match (excluding wildcard symbols). That helps explain why one rule beats another.

Limitations You Should Know

Robots behavior can vary by crawler, and not all bots support wildcards or crawl-delay. Some systems also apply extra rules (like server-side blocks, authentication, WAF protection, or geo restrictions) that robots.txt does not represent. Use this tool to validate robots logic, and use server logs and search engine tools to confirm real crawling behavior.

FAQ

Robots.txt Tester – Frequently Asked Questions

Questions about allowed vs blocked results, wildcards, crawl-delay, sitemaps, and how bots interpret rules.

A robots.txt tester checks a site’s robots rules for a chosen user-agent and tells you whether a specific URL path is allowed or blocked. It also shows which rule matched and why.

Crawlers read the robots.txt file and apply rules inside the best-matching user-agent group. For a URL, the crawler finds the most specific matching rule (often the longest match). If there is a tie, Allow usually wins over Disallow.

In general, yes. If robots.txt is missing (often a 404), crawlers typically treat it as allow all. However, pages can still be blocked by other controls like noindex, authentication, or server rules.

Not reliably. robots.txt controls crawling, not indexing. A page can sometimes appear in search results if it is linked elsewhere, even if crawling is blocked. Use noindex (and allow crawling) when the goal is to prevent indexing.

* is a wildcard that can match any sequence of characters. $ is an end marker that means the pattern must match the end of the URL path. Support varies by crawler, but many major crawlers follow this behavior.

It depends on rule specificity. If a Disallow rule matches more specifically than an Allow rule, the Disallow can win. The tester shows the matched rule and its specificity so you can see why.

Crawl-delay is a directive some crawlers may respect to slow down requests. Not all major search engines follow it. Rate control is usually handled better via server settings and crawl controls in search engine tools.

Sitemap lines list sitemap URLs to help crawlers discover your structured URL lists. A site can have multiple sitemap entries and they can point to sitemap indexes.

Differences come from how tools choose the best user-agent group, how they interpret wildcards, and how they break ties between Allow and Disallow. This tool shows its matching logic so you can audit outcomes.

No. The tool fetches robots.txt on-demand and runs the evaluation in your browser. Your inputs are not stored.

Results are for education and troubleshooting. Real bot behavior can vary by crawler and configuration. Always validate changes with server logs and the search engine tools you rely on.