Search code examples
apache.htaccessrobots.txtgooglebot

prevent googlebot from indexing file types in robots.txt and .htaccess


There are many Stack Overflow questions on how to prevent google bot from indexing, for instance, txt files. There's this:

robots.txt

User-agent: Googlebot Disallow: /*.txt$

.htaccess

<Files ~ "\.txt$">
     Header set X-Robots-Tag "noindex, nofollow"
</Files>

However, what is the syntax for both of these when trying to prevent two types of files from being indexed? In my case - txt and doc.


Solution

  • In your robots.txt file:

    User-agent: Googlebot
    Disallow: /*.txt$
    Disallow: /*.doc$
    

    More details at Google Webmasters: Create a robots.txt file


    In your .htaccess file:

    <FilesMatch "\.(txt|doc)$">
        Header set X-Robots-Tag "noindex, nofollow"
    </FilesMatch>
    

    More details here: http://httpd.apache.org/docs/current/sections.html