Search code examples
blockrobots.txt

Block specific file types from google search


I want to block XML files from Google bot except sitemap.XML. I am using Lazyest Gallery for my WordPress image gallery. Every gallery folder have a XML file containing the details of images. The problem is, now Google index those XML files instead of galleries. My site search also showing XML files instead of albums. will

Disallow: /*/*.xml$

work?

I have excluded feeds by adding

Disallow: /*/rss/$

to my robots.txt


Solution

  • To block all files of a certain type the simplest way is:

    Disallow: /*.xml$
    Disallow: /*.XML$
    

    Robots.txt is case sensitive, thus the two entries (you can leave 1 out if you know they are all one case). Now to make sure we aren't blocking the sitemap.xml we need to allow it first:

    Allow: /sitemap.xml
    Disallow: /*.xml$
    Disallow: /*.XML$
    

    There is also a sitemap directive in robots.txt to reference the location of the sitemap, so we can add that too:

    Allow: /sitemap.xml
    Disallow: /*.xml$
    Disallow: /*.XML$
    
    Sitemap: http://example.com/sitemap.xml