Search code examples
phpripping

Preventing data stealing


I know it's impossible to prevent people from stealing our data, but I have a large database and I want to at least prevent automated scripts from stealing my database.

My ideas so far:

  • use JavaScript or encode HTML = heavy and could easily be decoded
  • recaptcha for the search = no way, users will just leave my website
  • inserting random data and tags in the site HTML to avoid regex rip = good?

Any ideas are appreciated.


Solution

  • I think Alexa inserts random tags into the markup, and it gave me a heck of a time when I tried to mine it... they put some extra tags in the Alexa rankings, like <span class="a5r">35</span><span class="et4">52</span><span class="arer">16</span> and unless you downloaded the style sheet too and looked at the rendering rules, you couldn't figure out what number that was supposed to be.

    But... if I was patient enough, I could have "rendered" the numbers and then mined it, but it just wasn't worth it for me. Limiting page requests to a humanly possible amount would probably work well (50/min or something).