I'm using the following IIS Rewrite Rule to block as many bots as possible.
<rule name="BotBlock" stopProcessing="true">
<match url=".*" />
<conditions>
<add input="{HTTP_USER_AGENT}" pattern="^$|bot|crawl|spider" />
</conditions>
<action type="CustomResponse" statusCode="403" statusReason="Forbidden" statusDescription="Forbidden" />
</rule>
This rule blocks all requests with an empty User-Agent string or a User-Agent string that contains bot
, crawl
and spider
. This works great but it also blocks googlebot
, which I do not want.
So how do I exclude the googlebot
string from the above pattern so it does hit the site.
I've tried
^$|!googlebot|bot|crawl|spider
^$|(?!googlebot)|bot|crawl|spider
^(?!googlebot)$|bot|crawl|spider
^$|(!googlebot)|bot|crawl|spider
But they either block all User-Agents or still do not allow googlebot. Who has a solution and knows a bit about regex?
So thanks to The fourth bird the solution becomes:
<add input="{HTTP_USER_AGENT}" pattern="^$|\b(?!.*googlebot.*\b)\w*(?:bot|crawl|spider)\w*" />
If you want to match bot, but not google bot:
^$|(?<!\bgoogle)bot|crawl|spider
Or you could group the alternatives in a non capture group and surround that group with word boundaries to prevent partial matches for all alternatives:
^$|\b(?:bot|crawl|spider)\b