web-scraping bots search-engine-bots

Search bot detection

Is it possible to prevent a site from being scraped by any scrapers, but in the same time allow Search engines to parse your content.

Just checking for User Agent is not the best option, because it's very easy to simulate them.

JavaScript checks could be(Google execute JS) an option, but a good parser can do it too.

Any ideas?

Solution

Use DNS checking Luke! :)

Check the user agent to see if it's identifying itself as a search engine bot
If so, get the IP address requesting the page
Reverse DNS lookup the IP address to get a hostname
Forward DNS lookup the hostname to get an IP address

Same idea provided in help article Verifying Googlebot by Google