Search code examples
search-enginerobots.txt

Robots.txt "Allow" command?


In my robots.txt:

Disallow: /account
Allow: /account/

In my site, there is a page at /account which would only be accessible to someone logged in, but then to see the profile page of another user, you would go to /account/username. So I want robots.txt to disallow the single /account page but allow the directory. Does this setup work?

Corollary: Does Disallow: /account also disallow the directory /account/ or am I just wasting my time by then explicitly allowing it?


Solution

  • A couple of things to watch out for here.

    First, as @plasticinsect said in his answer, the most specific rule wins. For GoogleBot. Other bots, though, use the rule from the original robots.txt protocol that says directives are processed in sequential order--the order they appear in the robots.txt file. Those bots would see the Disallow and stop.

    At minimum, you should swap the order of the Allow and Disallow.

    In addition, there is sometimes disagreement about whether /account and /account/ are different urls. If a crawler hits your site with http://example.com/account/, the robots.txt is going to allow it. You probably want to disallow /account/$. That won't stop all bots (those that don't support the $ end-of-string marker will ignore the directive), but it's worth a shot.

    Given that, I would suggest:

    Disallow: /account/$
    Allow: /account/
    Disallow: /account
    

    Or, if you're just worried about Googlebot and other major crawlers:

    Disallow: /account$
    Disallow: /account/$