From google's support site -
To verify Googlebot as the caller:
Run a reverse DNS lookup on the accessing IP address from your logs, using the host command. Verify that the domain name is in either googlebot.com or google.com Run a forward DNS lookup on the domain name retrieved in step 1 using the host command on the retrieved domain name. Verify that it is the same as the original accessing IP address from your logs.
My question is why is the forward DNS lookup necessary? Can an attacker create a DNS record of the form crawl-xx-xx-xx-xx.googlebot.com
?
I am actually seeing this in my logs - from other crawlers as well. Ip's that if I reverse DNS lookup come from the correct domain, but the forward lookup does not return the IP. Wondering how is this possible..
Reverse zones can be served by anybody. If you own the IP space, and get your isp to forward reverse lookups, you can serve a reverse zone that points at anything you want.
As an attacker I can buy any IP block and serve my zone 4.3.2.1.in-addr.arpa
that says all records are in crawl-xx-xx-xx-xx.googlebot.com
I cannot control google's forward dns for that zone though. So even though I can get a reverse lookup for 1.2.3.4
to return crawl-12-34-56-78.googlebot.com
, I cannot get a forward lookup on crawl-12-34-56-78.googlebot.com
to return 1.2.3.4
.
The inconsistent entries in your logs are almost certainly third-party bots trying (quite well) to impersonate google.