Search code examples
search-engineweb-crawlerrobots.txtsearch-engine-bots

The * character in the Disallow statement of the Robots.txt File


How do different search bots interpret the * character in the disallow statement of the robots.txt file? Do all of them treat it as "none, one or more than one character" ?

Let's take the following example:

User-agent: *           
Disallow: /back-end*/*

What does the above code mean? Does it mean that any directory that has "back-end" won't be indexed, even if the word "back-end" is followed by any set of characters? And how about the * after the / ? Is it a good convention to write it?

Generally speaking, my question about the usage of the * in the disallow statement and if all search engine crawlers treats it the same way.


Solution

  • the Robot Exclusion Standard does not mention anything about the * character in the Disallow: statement. Some crawlers like Googlebot and Slurp recognize strings containing * while MSNbot and Teoma interpret it in different ways.