robots.txt Guide

Introduction

robots.txt is the first file search engine crawlers check. It tells them which areas they can access. Misconfiguration can block your entire site or waste crawl budget on unimportant pages.

Understanding robots.txt is essential — it controls crawl behavior and helps search engines focus on important pages.

This guide covers syntax, common patterns, and the important distinction between blocking crawling and blocking indexing.

Key Concepts

Basic Syntax

plain

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/

Sitemap: https://example.com/sitemap.xml

Crawling vs Indexing

robots.txt blocks CRAWLING, not INDEXING. A disallowed page can still appear in results if linked. To prevent indexing, use noindex meta tag or X-Robots-Tag header.

Practical Examples

1. Next.js robots.txt

typescript

// app/robots.ts
export default function robots() {
  return {
    rules: [{ userAgent: '*', allow: '/', disallow: ['/admin/', '/api/'] }],
    sitemap: 'https://example.com/sitemap.xml',
  };
}

2. Environment-Aware

typescript

export default function robots() {
  if (process.env.NODE_ENV !== 'production') {
    return { rules: { userAgent: '*', disallow: '/' } };
  }
  return { rules: { userAgent: '*', allow: '/' }, sitemap: 'https://example.com/sitemap.xml' };
}

3. Block Specific Bots

plain

User-agent: AhrefsBot
Disallow: /

User-agent: *
Disallow: /api/
Disallow: /*?sort=

Best Practices

✅ Keep robots.txt simple and organized
✅ Include Sitemap directive
✅ Block admin, API, and preview routes
✅ Test with Google's robots.txt Tester
✅ Use noindex for pages you don't want indexed
❌ Don't block CSS or JavaScript files
❌ Don't rely on robots.txt for security

Common Pitfalls

🚫 Blocking CSS/JS — Google can't render pages
🚫 Using Disallow when you mean noindex
🚫 Forgetting robots.txt on staging
🚫 Overly broad Disallow rules

robots.txt Guide

Introduction

Key Concepts

Basic Syntax

Crawling vs Indexing

Practical Examples

1. Next.js robots.txt

2. Environment-Aware

3. Block Specific Bots

Best Practices

Common Pitfalls

Recommended Tools

Google Search Console

robots.txt Guide

Introduction

Key Concepts

Basic Syntax

Crawling vs Indexing

Practical Examples

1. Next.js robots.txt

2. Environment-Aware

3. Block Specific Bots

Best Practices

Common Pitfalls

Related Guides

Recommended Tools

Google Search Console