What an XML Sitemap Does
An XML sitemap is a machine-readable list of URLs that you want search engines to discover and crawl. Think of it as a clean “directory” for bots. When a crawler already knows your site well, it can still find most URLs by following links. But when you launch new content, publish thousands of pages, or run a site with complex navigation (filters, pagination, faceted categories), a sitemap can dramatically improve discovery and crawling efficiency.
A sitemap does not replace internal linking, and it does not force indexing. It simply makes discovery easier and gives crawlers a structured list of pages that matter. If you’re serious about SEO—especially at scale—having a correct sitemap is one of the simplest technical wins available.
Why Generate a Sitemap from a URL List
Many sitemap tools assume you have a CMS plugin or a crawler that can traverse every page. That works for some sites, but it can be slow, inaccurate, or impossible for others. A URL-list generator is different: it starts from the URLs you already trust—your catalog, your database export, your router map, or your hand-curated list—and turns that into a standards-friendly sitemap.xml.
This approach is ideal when you’re building a large tool site, shipping new categories quickly, migrating domains, or managing pages that are generated from structured arrays. If you can export your canonical URLs, you can generate a clean sitemap without crawling anything.
What Search Engines Expect in a Sitemap
At minimum, each entry includes a <loc> URL. You can optionally include:
- lastmod: the date the content changed
- changefreq: how frequently content changes
- priority: a relative importance hint
In practice, lastmod is the most useful when it’s accurate. The other two fields are often ignored by major search engines, but they can still help internal tooling, consistency checks, and some secondary crawlers.
How the 50,000 URL and 50MB Limits Work
A single sitemap file should include no more than 50,000 URLs and should remain under about 50MB uncompressed. When you exceed either limit, you split your sitemap into multiple files and then create a sitemap index that lists those files.
This tool checks your URL count, estimates output size, and helps you split large lists. If you’re running a big site, splitting is normal and not a problem. Search engines handle sitemap indexes extremely well.
Best Practices for Clean Sitemap URLs
Use canonical URLs
Include the version of a URL you want indexed. If your site supports many parameter variations, don’t dump them all into your sitemap. Choose the canonical version (for example, the category page without tracking parameters).
Avoid duplicates
Duplicates waste crawler attention and make it harder to debug indexing issues. This generator can remove duplicates automatically, which is usually the right choice for a production sitemap.
Be consistent with www, https, and trailing slashes
Inconsistent URL formats create “multiple versions” of the same page in crawl systems. If your site is canonical on https + non-www (or https + www), keep the sitemap consistent. The normalization settings are designed to help you match your preferred format quickly.
How to Use This XML Sitemap Generator
- Paste your URLs (one per line). Full URLs work best.
- Optionally set a Base URL if you are pasting relative paths like
/blog/. - Choose cleanup options like stripping query strings or removing duplicates.
- Add metadata only if it reflects real updates (especially lastmod).
- Generate and review the results summary and output XML.
- Validate limits and fix issues if needed.
- Split + Index if your URL set is large.
- Download and upload the sitemap to your site, then submit it to Search Console.
Where to Put Your Sitemap File
The most common location is your site root: https://example.com/sitemap.xml. That’s easy to remember, easy to submit, and easy for crawlers to find.
Larger sites often place sitemaps in a folder like /sitemaps/, especially when using an index file and multiple split sitemaps.
Both are valid. What matters is that the files are publicly accessible (not blocked by authentication, IP restrictions, or robots rules) and that the URLs inside them are canonical, reachable, and return the correct status codes.
How to Submit a Sitemap
Google Search Console
Add your sitemap URL in the Sitemaps section. If you use a sitemap index, submit the index URL (for example, /sitemap-index.xml).
Search Console will show discovery, fetch status, and any sitemap parsing errors.
robots.txt
You can also reference your sitemap in robots.txt using a Sitemap: line. This is not required if you submit in Search Console, but it’s
a nice extra signal and helps other crawlers find your sitemap automatically.
Common Sitemap Mistakes
Including non-canonical or duplicate URLs
If your sitemap includes both http and https versions, or both www and non-www versions, crawlers may waste time and you’ll get confusing indexing signals. Fix by canonicalizing URLs in your sitemap and ensuring redirects and canonical tags match.
Including blocked or noindex pages
A sitemap should list pages you want crawled and indexed. If you include URLs that are blocked by robots.txt, require login, or are noindex, it can create noise and slow down diagnosis. Make sure your sitemap aligns with your real SEO goals.
Stuffing lastmod with “today” for everything
Setting lastmod to today for every URL can look suspicious and may reduce the usefulness of the signal. Use lastmod when it reflects real content updates. If you don’t have reliable dates, it’s better to omit lastmod entirely than to provide misleading metadata.
What If Google Doesn’t Index URLs from My Sitemap
A sitemap improves discovery, but indexing depends on quality and accessibility. If URLs are “Discovered – currently not indexed” or “Crawled – currently not indexed,” review content quality, duplication, internal linking, canonical tags, and whether the pages are truly useful. A sitemap can’t fix thin or duplicate content by itself.
When You Should Split Your Sitemap
Splitting is required when you exceed limits, but it can also be helpful for organization. Many sites keep separate sitemaps for:
- Tools / product pages
- Blog posts
- Categories
- Country or language sections
This makes it easier to audit coverage and diagnose indexing issues by section. If one sitemap starts showing errors, you can isolate the affected area faster.
FAQ
XML Sitemap Generator – Frequently Asked Questions
Answers about sitemap limits, lastmod, splitting, indexing, and where to upload your sitemap files.
An XML sitemap is a file (usually sitemap.xml) that lists the URLs on your site you want search engines to crawl and discover. It can include optional metadata like lastmod, changefreq, and priority.
A single sitemap file should contain up to 50,000 URLs and be no larger than 50MB uncompressed. If you exceed limits, split into multiple sitemaps and use a sitemap index file.
A sitemap index is an XML file that lists multiple sitemap files. It’s used when your site needs more than one sitemap (for example, if you have more than 50,000 URLs).
They are optional. lastmod can be useful when it reflects real content updates. changefreq and priority are often ignored by major search engines, but they can still help with internal consistency and tooling.
Usually no, unless parameter URLs represent canonical pages you want indexed. In most cases, include clean canonical URLs and avoid duplicates created by tracking parameters.
Yes. If you have a list of URLs (from your router, database, exported links, or a manual list), you can paste them here and generate a valid sitemap.xml.
Typically to your site root (https://example.com/sitemap.xml). You can also use sitemap indexes and place them in a dedicated location, as long as the URLs are publicly accessible.
Add it in Google Search Console under Sitemaps, and optionally reference it in robots.txt using a Sitemap: line. The tool provides a ready-to-copy robots.txt line.
No. A sitemap helps discovery and crawl efficiency, but indexing depends on quality, canonicalization, access, and whether the pages meet search engine guidelines.
No. Generation runs in your browser and your pasted URLs are not stored by this tool.