Sitemap XML Inspector

Inspect sitemap XML for duplicates, invalid dates, non-absolute URLs, and protocol mixing issues

Enter your sitemap XML content to inspect for duplicates, invalid dates, and protocol issues

What is the Sitemap XML Inspector?

The Sitemap XML Inspector is a client-side tool that analyzes your XML sitemap for common issues that can affect search engine crawling and indexing efficiency. It detects duplicate URLs that waste crawl budget, validates lastmod dates against W3C Datetime format, identifies non-absolute URLs that crawlers cannot process, and flags HTTP/HTTPS protocol mixing that causes duplicate content problems — providing a quality score with actionable recommendations.

How to Use

  1. Paste your sitemap XML content into the input field
  2. Click "Inspect" or wait for automatic processing
  3. Review the score, URL count, and detected findings
  4. Check specific issues: duplicates, invalid dates, non-absolute URLs, protocol mixing
  5. Export a full report in JSON or Markdown format for sharing with your team

Example: Sitemap with Issues

This sitemap has several problems the inspector will detect:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/page</loc>
    <lastmod>2024-03-15</lastmod>
  </url>
  <url>
    <loc>https://example.com/page</loc>
    <lastmod>March 15, 2024</lastmod>
  </url>
  <url>
    <loc>http://example.com/blog</loc>
  </url>
  <url>
    <loc>/about</loc>
  </url>
</urlset>

What Issues Are Detected?

  • Duplicate URLs — Same URL appearing multiple times wastes crawl budget and can confuse search engines about the canonical version
  • Invalid lastmod dates — Dates must use W3C Datetime format (ISO 8601). Invalid formats are ignored by crawlers, reducing the effectiveness of your sitemap
  • Non-absolute URLs — All sitemap URLs must be fully qualified with protocol and domain. Relative URLs violate the Sitemap protocol and are rejected by search engines
  • HTTP/HTTPS mixing — Using both protocols signals potential duplicate content issues and splits link equity between HTTP and HTTPS versions
  • Invalid changefreq — Only specific values are allowed: always, hourly, daily, weekly, monthly, yearly, never
  • Invalid priority — Priority must be a decimal between 0.0 and 1.0

Sitemap Best Practices

  • Keep each sitemap under 50,000 URLs and 50MB uncompressed — use a sitemap index for larger sites
  • Use HTTPS for all URLs if your site serves content over HTTPS
  • Update lastmod only when page content actually changes — do not auto-update timestamps
  • Remove duplicate URLs — each page should appear exactly once
  • Use absolute URLs with consistent trailing slashes matching your canonical URLs
  • Reference your sitemap in robots.txt with a Sitemap directive
  • Submit your sitemap to Google Search Console and Bing Webmaster Tools

Privacy and Security

All inspection happens entirely in your browser using the native DOMParser API. Your sitemap content — which reveals your site structure and all indexed URLs — is never transmitted to any server. No data is stored, logged, or shared.

Frequently Asked Questions

What does the Sitemap XML Inspector detect?

The inspector counts all URLs in your sitemap, detects duplicate entries, validates lastmod dates against W3C Datetime format (ISO 8601), identifies non-absolute URLs that search engines cannot process, detects HTTP/HTTPS protocol mixing, and validates changefreq and priority values.

Why are duplicate URLs in a sitemap a problem?

Duplicate URLs waste crawl budget — search engines allocate a limited number of requests to crawl your site. When the same URL appears multiple times, crawlers spend resources re-fetching the same page instead of discovering new content. Remove duplicates to ensure efficient crawling.

What format should lastmod dates use?

Lastmod dates must use W3C Datetime format (a subset of ISO 8601). Valid formats include: YYYY (2024), YYYY-MM (2024-03), YYYY-MM-DD (2024-03-15), and full datetime with timezone like 2024-03-15T10:30:00Z or 2024-03-15T10:30:00+02:00. Invalid dates are ignored by search engines.

Why must sitemap URLs be absolute?

The Sitemap protocol specification requires all URLs to be fully qualified (absolute), starting with http:// or https://. Relative URLs like /page or ../page are invalid and will be ignored by search engine crawlers. Always include the full domain in each URL entry.

What is HTTP/HTTPS protocol mixing?

Protocol mixing occurs when a sitemap contains both http:// and https:// URLs. Search engines treat these as different pages, which can cause duplicate content issues and split link equity. If your site uses HTTPS (recommended), all sitemap URLs should also use HTTPS.

Does this tool support sitemap index files?

Yes. The inspector supports both standard sitemaps (<urlset>) and sitemap index files (<sitemapindex>). For sitemap indexes, it extracts and validates the referenced sitemap URLs including their lastmod dates.

Is my sitemap data sent to any server?

No. All inspection happens entirely in your browser using the native DOMParser API. Your sitemap content — which reveals your site structure and all indexed URLs — never leaves your device. No data is stored, logged, or transmitted.

What is the maximum sitemap size I can inspect?

The inspector can handle sitemaps with thousands of URLs. However, very large sitemaps (over 50,000 URLs or 50MB) may cause the browser to slow down. The Sitemap protocol itself limits individual sitemap files to 50,000 URLs and 50MB uncompressed. Consider splitting large sitemaps using a sitemap index.