Search code examples
google-analyticsweb-crawlerphantomjsanemone

Prevent fake analytics statistics with custom crawler


Is there a way to prevent faked Google Analytics statistics by using PhantomJS and/or a ruby crawler like Anemone?

Our monitoring tool (which is based on both of them) crawls the sites from our clients and updates the link status of each link in a specific domain.

The problem, that simulates huge trafic.

Is there a way to say something like "I'm a robot, don't track me" with a cookie, header or something?

(adding crawler IP's to Google Analytics [as a filter] may not be the best solution)


Solution

  • I found a quick solution for this specific problem. The easiest way to exclude your crawler which executes js (like phantomjs) from all Google Analytics statistics is, to simply block the Google Analytics domain through the /etc/hosts.

    127.0.0.1    www.google-analytics.com
    127.0.0.1    google-analytics.com
    

    It's the easiest way to prevent fake data. This way, you don't have to add a filter to all your clients.

    ( thanks for other answers )