Search code examples
drupal-7web-scraping

How to discourage scraping on a Drupal website?


I have a Drupal website that has a ton of data on it. However, people can quite easily scrape the site, due to the fact that Drupal class and IDs are pretty consistent.

  1. Is there any way to "scramble" the code to make it harder to use something like PHP Simple HTML Dom Parser to scrape the site?
  2. Are there other techniques that could make scraping the site a little harder?
  3. Am I fighting a lost cause?

I am not sure if "scraping" is the official term, but I am referring to the process by which people write a script that "crawls" a website and parses sections of it in order to extract data and store it in their own database.


Solution

  • I am not sure but I think that it is quite easy to crawl a website where all contents are public, no matter if the IDs are sequential or not. You should take into account that if a human can read your Drupal site, a script also does.

    Depending on your site's nature if you don't want your content to be indexed by others, you should consider setting registered-user access. Otherwise, I think you are fighting a lost cause.