Introduction
XML sitemaps are a direct communication channel with search engines. They tell crawlers which pages exist, when updated, and their relative importance.
Sitemaps are especially critical for large sites, sites with poor internal linking, new sites, and sites with frequently changing content.
This guide covers sitemap generation, dynamic sitemaps, and proper submission to search engines.
Key Concepts
XML Sitemap Format
xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2024-02-01</lastmod>
<priority>1.0</priority>
</url>
</urlset>
Sitemap Index
xml
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap><loc>https://example.com/sitemap-pages.xml</loc></sitemap>
<sitemap><loc>https://example.com/sitemap-blog.xml</loc></sitemap>
</sitemapindex>
Practical Examples
1. Next.js Dynamic Sitemap
typescript
// app/sitemap.ts
export default async function sitemap() {
const posts = await getAllPosts();
return [
{ url: 'https://example.com', lastModified: new Date(), priority: 1.0 },
...posts.map(p => ({
url: `https://example.com/blog/${p.slug}`,
lastModified: p.updatedAt,
priority: 0.7,
})),
];
}
2. next-sitemap Package
javascript
module.exports = {
siteUrl: 'https://example.com',
generateRobotsTxt: true,
sitemapSize: 7000,
exclude: ['/admin/*', '/api/*'],
};
Best Practices
- ✅ Include only canonical, indexable URLs
- ✅ Keep lastmod accurate — update only when content changes
- ✅ Split large sitemaps with a sitemap index
- ✅ Reference sitemap in robots.txt
- ✅ Submit via Google Search Console
- ❌ Don't include noindex pages or redirects
- ❌ Don't set all priorities to 1.0
Common Pitfalls
- 🚫 Stale lastmod dates — use actual modification dates
- 🚫 Including non-canonical URLs
- 🚫 Exceeding 50,000 URLs per sitemap
- 🚫 Not monitoring crawl stats in Search Console