Search code examples
pythontime-seriesarimaanomaly-detection

Which threshold to adapt to detect anomaly using ARIMA model


I am trying to detect anomalies in a time series dataset. I am classifying the predicted values based on thresholds.

Here is a detailled description about what I did:

I splitted my total dataset into training and testing dataset then I fitted my ARIMA model on training dataset. I used the founded model to predict the testing observations than I calculated the error between actual and predicted values:

Error = actual_testing - predicted_testing

Normally, I must choose the threshold to classify each observation, based on the calculated error.

if the Error> threshold ==> it is an anomaly

is there any method to choose this threshold value?


Solution

  • One approach is to compute errors across your training or validation set. Then to fit a statistical distribution to the errors, for example a Gaussian (normal distribution). This has the effect of normalizing the range of the scores, and to allow to interpret a score as a probability. Then one can set a threshold for example at 2-6 standard deviations, depending on how many anomalies you want to flag.