Search code examples
algorithmstatisticsdetection

Detect points that are very scattered from the rest of the data


I have a set of results (numbers), and I would like to know if a given result is very good/bad compared to the previous results (only previous).

Each result is a number € IR+. For example if you have the sequence 10, 11, 10, 9.5, 16 then 16 is clearly a very good result compared to the previous ones. I would like to find an algorithm to detect this situation (very good/bad result compared to previous results).

enter image description here

A more general way to state this problem is : how to determine if a point - in a given set of data - is scattered from the rest of the data.

Now, that might look like a peak detection problem, but since the previous values are not constant there are many tiny peaks, and I only want the big ones.

My first idea was to compute the mean and determine the standard deviation but it is quite limited. Indeed, if there is one huge/low value in the previous results it will change dramatically the mean/stadard deviation and the next results will have to be even greater/lower to beat the standard deviation (in order to be detected) and therefor many points will not be (properly) detected.

I'm quite sure that must a well known problem.

Can anyone help me on this ?


Solution

  • This kind of problem is called Anomaly Detection.