Robots.txt Examples for SEO: Common Rules, Mistakes, and

If you searched for robots.txt examples, you probably do not need theory first. You need safe patterns you can copy, adapt, and test before they block the wrong pages.

The biggest mistake teams make with robots.txt is treating it like a deindex button. It is not. robots.txt controls crawling, not guaranteed index removal.

If you want to validate a live or draft file against a real URL, use the Robots.txt Tester. If you need to draft a file first, use the Robots.txt Generator. For the broader crawlability workflow, keep the Technical SEO Audit guide and SEO Website Audit Checklist nearby.

What robots.txt can and cannot do

Robots.txt can help you:

block crawlers from low-value sections
prevent staging or internal search pages from being crawled
reduce crawl waste on parameter-heavy areas
publish sitemap locations

Robots.txt cannot reliably:

remove already-known URLs from search on its own
replace a proper noindex strategy where indexing control is needed
fix duplicate content by itself
protect private content from real access

If content is sensitive, use authentication. Do not rely on robots.txt alone.

Example 1: Simple open site with sitemap

Use this when you want normal crawling and just need a clean baseline file.

User-agent: *
Disallow:

Sitemap: https://www.example.com/sitemap.xml

Why it works:

nothing important is blocked
the sitemap is easy for crawlers to discover
the file is easy to audit later

Example 2: Block staging completely

Use this only on staging or pre-production environments.

User-agent: *
Disallow: /

Important:

this should never survive a production launch
staging should also be protected with login or IP controls

If a launch went live with Disallow: /, treat that as a release issue and fix it immediately.

Example 3: Block internal search pages

Internal search results usually do not need search-engine crawl demand.

User-agent: *
Disallow: /search
Disallow: /?s=

Sitemap: https://www.example.com/sitemap.xml

Use this when:

site search pages create thin or duplicate combinations
filters or query pages expand infinitely
you want crawl budget focused on canonical pages

Test exact URLs before publishing. A bad rule here can accidentally catch intended landing pages or faceted navigation routes.

Example 4: Block admin and login areas

User-agent: *
Disallow: /admin/
Disallow: /login/
Disallow: /checkout/

This is common and usually safe, but remember:

blocked URLs can still be discovered from links
private sections still need real access control

Example 5: Faceted navigation with a narrow allow pattern

This is where teams often get into trouble.

User-agent: *
Disallow: /collections/*?color=
Disallow: /collections/*?size=
Allow: /collections/new-arrivals

Sitemap: https://www.example.com/sitemap.xml

This kind of rule can help on parameter-heavy ecommerce sites, but only when:

the blocked combinations are truly low value
the allowed landing pages are intentional canonical pages
you have tested exact URLs with the Robots.txt Tester

Do not block filter paths blindly. Some faceted URLs may be valuable landing pages.

Example 6: Multi-bot rules

User-agent: Googlebot
Disallow:

User-agent: *
Disallow: /tmp/
Disallow: /search

Sitemap: https://www.example.com/sitemap.xml

This is useful only when you have a very specific reason to treat bots differently. Most sites should keep the file simple unless there is a clear crawl-management need.

Common robots.txt mistakes

1. Using robots.txt to try to deindex pages

If a URL is already known externally, search engines can still keep a URL indexed without crawling its content.

Use noindex where appropriate and remove internal references when the goal is index cleanup.

2. Leaving `Disallow: /` live after launch

This happens more often than teams admit, especially after rushed launches.

Check production robots.txt as part of your Website Launch Checklist, not just staging QA.

3. Blocking assets that pages need to render

If important CSS, JS, or media routes are blocked, crawlers may not fully understand the page layout or content.

Keep rendering resources crawlable unless you have a strong reason not to.

4. Publishing rules without testing exact URLs

Pattern assumptions are where most mistakes happen.

Always test:

one intended allowed URL
one intended blocked URL
one edge-case URL near the same pattern

5. Treating `crawl-delay` like a Google control

Crawl-delay is not a reliable Google lever. Keep it out unless you know another bot in your environment needs it.

10-minute robots.txt QA workflow

Step	What to test	Pass condition
1	Production file loads	`/robots.txt` returns `200` and plain text
2	Core pages	Important pages are crawlable
3	Low-value paths	Internal search, admin, or staging patterns are blocked as intended
4	Sitemaps	Correct sitemap lines are present
5	Launch safety	No accidental `Disallow: /` on production

Use the Robots.txt Tester for live validation and the Robots.txt Generator to rebuild a cleaner file if the current one is messy.

Safe launch checklist for robots.txt

block staging, not production
keep primary revenue pages crawlable
test one real URL per rule before deploy
publish the production sitemap line
check the file again right after go-live

If your launch includes URL changes and redirects, pair this with How to Find Redirect Chains After a Website Migration and 301 vs 302 Redirects: When to Use Each and How to Test Them.

FAQ

Should I block internal search pages in robots.txt?

Often yes, especially when those pages create thin, duplicate, or infinite crawl paths.

Should I block tag pages or filter pages?

Only if they are genuinely low value and not part of your organic strategy. Test specific paths before publishing broad rules.

Can I use robots.txt to hide a staging site?

Use it as one layer, but staging should also be protected with authentication or IP restrictions.

What is the safest default robots.txt file?

The safest default for most live sites is a simple open file plus a sitemap line:

User-agent: *
Disallow:

Sitemap: https://www.example.com/sitemap.xml

Final rule

Good robots.txt files are short, intentional, and tested.

If a rule exists, you should be able to explain:

what exact URLs it is meant to affect
why those URLs should be blocked or allowed
how you tested the rule before deploy

If you cannot answer those three questions, simplify the file and test it again.

Robots.txt Examples for SEO: Common Rules, Mistakes, and Safe Patterns (2026)

What robots.txt can and cannot do

Robots.txt can help you:

Robots.txt cannot reliably:

Example 1: Simple open site with sitemap

Example 2: Block staging completely

Example 3: Block internal search pages

Example 4: Block admin and login areas

Example 5: Faceted navigation with a narrow allow pattern

Example 6: Multi-bot rules

Common robots.txt mistakes

1. Using robots.txt to try to deindex pages

2. Leaving `Disallow: /` live after launch

3. Blocking assets that pages need to render

4. Publishing rules without testing exact URLs

5. Treating `crawl-delay` like a Google control

10-minute robots.txt QA workflow

Safe launch checklist for robots.txt

FAQ

Should I block internal search pages in robots.txt?

Should I block tag pages or filter pages?

Can I use robots.txt to hide a staging site?

What is the safest default robots.txt file?

Final rule

More from the blog

AI Website Audit Checklist: SEO, UX & AI Search Gaps

Title Tag Checker + Meta Description Checker: What to Validate Before You Publish (2026)

How to Find Redirect Chains After a Website Migration (2026)

Ready to Win More Clients?

What robots.txt can and cannot do

Robots.txt can help you:

Robots.txt cannot reliably:

Example 1: Simple open site with sitemap

Example 2: Block staging completely

Example 3: Block internal search pages

Example 4: Block admin and login areas

Example 5: Faceted navigation with a narrow allow pattern

Example 6: Multi-bot rules

Common robots.txt mistakes

1. Using robots.txt to try to deindex pages

2. Leaving Disallow: / live after launch

3. Blocking assets that pages need to render

4. Publishing rules without testing exact URLs

5. Treating crawl-delay like a Google control

10-minute robots.txt QA workflow

Safe launch checklist for robots.txt

FAQ

Should I block internal search pages in robots.txt?

Should I block tag pages or filter pages?

Can I use robots.txt to hide a staging site?

What is the safest default robots.txt file?

Final rule

More from the blog

AI Website Audit Checklist: SEO, UX & AI Search Gaps

Title Tag Checker + Meta Description Checker: What to Validate Before You Publish (2026)

How to Find Redirect Chains After a Website Migration (2026)

Ready to Win More Clients?

2. Leaving `Disallow: /` live after launch

5. Treating `crawl-delay` like a Google control