Search code examples
xmlhttpssitemapgoogle-crawlers

After submitting the new sitemap, google still looking for old sitemap files


We recently(about 4 months ago) did a few changes on our website. Major change was to move the entire website from HTTP to HTTPS. We also re-structures our sitemap files. Earlier we were having files like: sitemap-1.xml, sitemap-2.xml, etc. In the new implementation we replaced all the sitemap files with their .gz version and submitted an index-sitemap.xml to google for crawling(following google's guidelines). Every thing is working fine other than

  • google is still trying to get the old sitemap files which are resulting in 404 errors
  • google is also making request on HTTP which are resulting in 301 redirects

Any idea if we might have missed something ? Or by when google will stop hitting old URLs ?


Solution

  • This is normal behavior. When something goes 404, Google wants to make sure it was not a temporary accident on the server side. So they try to access it again from time to time until they give up (sometimes after 6 months).

    You can return a 410 instead to tell Google the page (or sitemap.xml) is gone gone gone. They will stop trying again and again faster.

    Regarding the 301, some pages from external websites may still point to the http version of your pages. If you don't have control on them, there is nothing you can do about it.