Search code examples
asp.netweb-scraping.net-3.5

How can I prevent my asp.net site from being screen scraped?


How can I prevent my asp.net 3.5 website from being screen scraped by my competitor? Ideally, I want to ensure that no webbots or screenscrapers can extract data from my website.

Is there a way to detect that there is a webbot or screen scraper running ?


Solution

  • It is possible to try to detect screen scrapers:

    Use cookies and timing, this will make it harder for those out of the box screen scrapers. Also check for javascript support, most scrapers do not have it. Check Meta browser data to verify it is really a web browser.

    You can also check for requests in a minute, a user driving a browser can only make a small number of requests per minute, so logic on the server that detects too many requests per minute could presume that screen scraping is taking place and prevent access from the offending IP address for some period of time. If this starts to affect crawlers, log the users ip that is blocked, and start allowing their IPs as needed.

    You can use http://www.copyscape.com/ to proect your content also, this will at least tell you who is reusing your data.

    See this question also:

    Protection from screen scraping

    Also take a look at

    http://blockscraping.com/

    Nice doc about screen scraping:

    http://www.realtor.org/wps/wcm/connect/5f81390048be35a9b1bbff0c8bc1f2ed/scraping_sum_jun_04.pdf?MOD=AJPERES&CACHEID=5f81390048be35a9b1bbff0c8bc1f2ed

    How to prevent screen scraping:

    http://mvark.blogspot.com/2007/02/how-to-prevent-screen-scraping.html