Search code examples
pythonmachine-learningdatasetdata-scienceanomaly-detection

click fraud detection with a lot of zero data


I have a data set of some advertise publishers. publishers earn money for each click on the advertises. data set is consist of publishers list and the corresponding number of clicks and number of transaction they caused. the problem is whether the publisher cheating and click it's own advertise to gain more money or not. but some of these publishers total click is very very small (below 10) and therefore the number of transactions are 0.

my question is that what should i do with these zero data? they actually ruin my gaussian distribution of data. what should i do with them? just eliminate them from my data set? is there any statistical approach to do such thing?

by the way I'm very new to data analysis and excuse me if the answer is obvious, but i couldn't find answer on the web.


Solution

  • Remove zero's

    >>> x = [0,2,0,5,0,6,77,8,9]
    >>> list(filter((0).__ne__, x))
    [2, 5, 6, 77, 8, 9]
    

    The shape of your gaussian distribution will change.