Search code examples
search-enginesitemaprobots.txt

Site Map index and robots.txt referencing .gz files


For a website, my Site Map Index file and all my Site Maps are gzipped and have names like the following (SiteMapIndex.xml.gz, SiteMap1.xml.gz, SiteMap2.xml.gz), should the robots.txt file and SiteMapIndex.xml file have references to the gzipped file name or non-gzipped file name?

Example - Should robots.txt contents look like this? -

Sitemap: http://www.mysite.com/SiteMapIndex.xml.gz

or like this (without the .gz)?

Sitemap: http://www.mysite.com/SiteMapIndex.xml

Should SiteMapIndex.xml contents look like this? -

...
<sitemap>
  <loc>http://www.mysite.com/SiteMap1.xml.gz</loc>
  <lastmod>2013-08-20</lastmod>
</sitemap>
<sitemap>
  <loc>http://www.mysite.com/SiteMap2.xml.gz</loc>
  <lastmod>2013-08-20</lastmod>
</sitemap>
...

or this (without the .gz)? -

...
<sitemap>
  <loc>http://www.mysite.com/SiteMap1.xml</loc>
  <lastmod>2013-08-20</lastmod>
</sitemap>
<sitemap>
  <loc>http://www.mysite.com/SiteMap2.xml</loc>
  <lastmod>2013-08-20</lastmod>
</sitemap>
...

Solution

  • If you want the bot to read the .gz file, you put the .gz name in the index. That is:

    <sitemap>
      <loc>http://www.mysite.com/SiteMap1.xml.gz</loc>
      <lastmod>2013-08-20</lastmod>
    </sitemap>
    <sitemap>
      <loc>http://www.mysite.com/SiteMap2.xml.gz</loc>
      <lastmod>2013-08-20</lastmod>
    </sitemap>
    

    See Using Sitemap Index Files.

    The same thing goes for your robots.txt file: put the name of the gzipped file.

    See Specifying the Sitemap location in your robots.txt file