Robots.txt Generator
Generate a robots.txt file to control search engine crawling
General Settings
Full URL to your XML sitemap
Search Robots
Set individual access rules per robot. "Default" means the robot follows the general setting above.
Restricted Directories
The path is relative to root and must contain a trailing slash "/"
About Robots.txt Generator
The Robots.txt Generator helps you build a correct robots.txt file with allow and disallow rules
plus sitemap references. Control how search engine bots crawl your website, protect sensitive directories,
and ensure your sitemap is discoverable by all major search engines — a foundational step in
technical SEO that directly impacts how your site is indexed.
Understanding robots.txt
The robots.txt file is a plain text file placed at your website's root that instructs search
engine crawlers which pages or directories they should or shouldn't access. Every major search engine
(Google, Bing, Yahoo, Yandex, Baidu) reads this file before crawling your site. A well-configured
robots.txt prevents crawl budget waste on low-value pages (admin panels, search result pages,
duplicate content), protects sensitive areas from appearing in search indexes, and directs bots to your
sitemap for efficient discovery of your most important content.
Key Features
- User-agent rules — create rules for specific bots (Googlebot, Bingbot, GPTBot) or all bots using the wildcard (
*). - Allow & Disallow directives — specify which paths bots can and cannot crawl on your website.
- Sitemap references — add sitemap URLs so search engines can discover and index your pages efficiently.
- Crawl-delay support — set crawl delay values to throttle bot requests and protect server resources.
- Multiple rule groups — create separate rule sets for different user agents in a single file.
- Live preview — see the generated
robots.txtcontent update in real-time as you configure rules.
How to Create Your robots.txt
- Add user-agent — start with a rule group for
*(all bots) or target specific bots likeGooglebot. - Set directives — add Allow and Disallow directives with the paths you want to control.
- Add sitemap — include your sitemap URL(s) so search engines can discover all your pages.
- Download — copy or download the generated
robots.txtfile and place it at your site's root directory.
Essential Directives Explained
Disallow: /admin/ blocks crawlers from your admin directory. Disallow: /search prevents internal search result pages from being indexed. Allow: /public/ explicitly permits access to a directory within a disallowed parent. Sitemap: tells search engines where to find your XML sitemap. Crawl-delay: (supported by Bing, Yandex; ignored by Google) sets seconds between requests. Rules are matched top-to-bottom per user-agent group, and the most specific matching rule wins in Google's implementation.
Real-World Use Cases
- Blocking search engines from crawling admin panels, staging pages, login pages, or internal search results.
- Preventing duplicate content issues by disallowing URL parameters, print versions, or filtered category pages.
- Adding sitemap references to improve search engine discovery and maximize crawl efficiency.
- Setting crawl delays for smaller servers that can't handle aggressive bot traffic from multiple crawlers.
- Blocking AI training bots (GPTBot, CCBot) from scraping your content while still allowing search engine indexing.
Frequently Asked Questions
Where should I place the robots.txt file?
The robots.txt file must be placed at the root of your domain (e.g. https://example.com/robots.txt). Search engines only look for it at this exact location. Subdirectory placements are ignored.
Does robots.txt guarantee pages won't be indexed?
No. robots.txt is a suggestion to well-behaved bots. To prevent indexing, use the noindex meta tag or X-Robots-Tag HTTP header instead.
Can I test my robots.txt rules?
Google Search Console includes a robots.txt Tester where you can verify your rules against specific URLs and user agents before deploying.
Should I block CSS and JavaScript files?
No. Google needs access to CSS and JS files to render your pages correctly. Blocking these resources can hurt your search rankings and mobile usability scores.
Can I block specific AI crawlers?
Yes. Add User-agent: GPTBot with Disallow: / to block OpenAI's crawler, or User-agent: CCBot for Common Crawl. Each AI crawler has its own user-agent string.