Robots.txt Generator

Generate a robots.txt file to control search engine crawling

General Settings

Full URL to your XML sitemap

Search Robots

Set individual access rules per robot. "Default" means the robot follows the general setting above.

Google
Googlebot
Google Image
Googlebot-Image
Google Mobile
Googlebot-Mobile
MSN Search
msnbot
Yahoo
Slurp
Yahoo MM
Yahoo-MMCrawler
Yahoo Blogs
Yahoo-Blogs
Ask/Teoma
Teoma
GigaBlast
Gigabot
DMOZ Checker
Robozilla
Nutch
Nutch
Alexa/Wayback
ia_archiver
Baidu
Baiduspider
Naver
Yeti
MSN PicSearch
psbot
Restricted Directories

The path is relative to root and must contain a trailing slash "/"

Generated robots.txt

About Robots.txt Generator

The Robots.txt Generator helps you build a correct robots.txt file with allow and disallow rules plus sitemap references. Control how search engine bots crawl your website, protect sensitive directories, and ensure your sitemap is discoverable by all major search engines — a foundational step in technical SEO that directly impacts how your site is indexed.

Understanding robots.txt

The robots.txt file is a plain text file placed at your website's root that instructs search engine crawlers which pages or directories they should or shouldn't access. Every major search engine (Google, Bing, Yahoo, Yandex, Baidu) reads this file before crawling your site. A well-configured robots.txt prevents crawl budget waste on low-value pages (admin panels, search result pages, duplicate content), protects sensitive areas from appearing in search indexes, and directs bots to your sitemap for efficient discovery of your most important content.

Key Features

  • User-agent rules — create rules for specific bots (Googlebot, Bingbot, GPTBot) or all bots using the wildcard (*).
  • Allow & Disallow directives — specify which paths bots can and cannot crawl on your website.
  • Sitemap references — add sitemap URLs so search engines can discover and index your pages efficiently.
  • Crawl-delay support — set crawl delay values to throttle bot requests and protect server resources.
  • Multiple rule groups — create separate rule sets for different user agents in a single file.
  • Live preview — see the generated robots.txt content update in real-time as you configure rules.

How to Create Your robots.txt

  1. Add user-agent — start with a rule group for * (all bots) or target specific bots like Googlebot.
  2. Set directives — add Allow and Disallow directives with the paths you want to control.
  3. Add sitemap — include your sitemap URL(s) so search engines can discover all your pages.
  4. Download — copy or download the generated robots.txt file and place it at your site's root directory.

Essential Directives Explained

Disallow: /admin/ blocks crawlers from your admin directory. Disallow: /search prevents internal search result pages from being indexed. Allow: /public/ explicitly permits access to a directory within a disallowed parent. Sitemap: tells search engines where to find your XML sitemap. Crawl-delay: (supported by Bing, Yandex; ignored by Google) sets seconds between requests. Rules are matched top-to-bottom per user-agent group, and the most specific matching rule wins in Google's implementation.

Real-World Use Cases

  • Blocking search engines from crawling admin panels, staging pages, login pages, or internal search results.
  • Preventing duplicate content issues by disallowing URL parameters, print versions, or filtered category pages.
  • Adding sitemap references to improve search engine discovery and maximize crawl efficiency.
  • Setting crawl delays for smaller servers that can't handle aggressive bot traffic from multiple crawlers.
  • Blocking AI training bots (GPTBot, CCBot) from scraping your content while still allowing search engine indexing.

Frequently Asked Questions

Where should I place the robots.txt file?

The robots.txt file must be placed at the root of your domain (e.g. https://example.com/robots.txt). Search engines only look for it at this exact location. Subdirectory placements are ignored.

Does robots.txt guarantee pages won't be indexed?

No. robots.txt is a suggestion to well-behaved bots. To prevent indexing, use the noindex meta tag or X-Robots-Tag HTTP header instead.

Can I test my robots.txt rules?

Google Search Console includes a robots.txt Tester where you can verify your rules against specific URLs and user agents before deploying.

Should I block CSS and JavaScript files?

No. Google needs access to CSS and JS files to render your pages correctly. Blocking these resources can hurt your search rankings and mobile usability scores.

Can I block specific AI crawlers?

Yes. Add User-agent: GPTBot with Disallow: / to block OpenAI's crawler, or User-agent: CCBot for Common Crawl. Each AI crawler has its own user-agent string.

© glutool. v1.0
Powered with by RL
Code snippet