Search code examples
web-crawler

Deny access but allow robots i.e. Google to sitemap.xml


Is there a method where you can only allow robots such as Google, Yahoo, or other search engine robots to my sitemap which is located at http://www.mywebsite.com/sitemap.xml. Is this possible to not allow direct access by a user but only to robots?


Solution

  • Well basically no, but you could do something with the user-agent string and disallow access (assuming Apache)

    <Location /sitemap.xml>
      SetEnvIf User-Agent GodBot GoAway=1
      Order allow,deny
      Allow from all
      Deny from env=!GoAway
    </Location>
    

    But as it says here (where I found the syntax)

    Warning:

    Access control by User-Agent is an unreliable technique, since the User-Agent header can be set to anything at all, at the whim of the end user.