Search code examples
htmlseorobots.txtgoogle-crawlers

Robots.txt how to crawl specific type of urls while excluding others similar


I have these types of URL's :

www.example.com/view/a-dF%2Dg3_dG
www.example.com/view/a-K5gD2%3F%f
www.example.com/view/a-b3R%2f%s_2

So basically they start with /view/a- and continue with randum chars. I want to block google from crawling them.

However, there is one exception. I have an URL which looks like this:

www.example.com/view/a-home

This should be an exception, this URL should still be crawled. How can I do this ?


Solution

  • This won't work for all bots, but the major search engines now support both Disallow and Allow directives:

    User-Agent: *
    Disallow: /a-
    Allow: /a-home
    

    The longest matching rule is the one that gets used. For /a-home both rules match, but the rule that allows it it longer, so it is used. For /a-dFte, only the disallow rule matches.

    Bots that don't understand Allow: directives would be unable to crawl your home page.

    You can use Google's robots.txt tester tool to make sure that your robots.txt syntax is correct and that specific URLs are either disallowed or allowed as you expect.