Search code examples
c#.netmachine-learningml.netanomaly-detection

ML.NET DetectSpikeBySSL producing incorrect predictions


I'm using ML.NET to detect spikes within a list of values (with corresponding dates).

As I understand it, the predictions output is a list of the following:

0.0, (Is spike?)
5.23, (Score = Actual value - predicted value)
0.20028081299669430182, (PValue)

Most of the times the spike detection works well, however in many other times i recieve incorrect predictions, as if the input was totaly different.

For example - in a specific scenario, the input value "2" gets the following prediction:

1.0,
11.739639282226563, (This is not reasonable since the score is higher than the value itself!)
0.00028081299669430182

Code:

MLContext machineLearningContext = new();
IDataView dataView = machineLearningContext.Data.LoadFromEnumerable(aggregationEvents);

string inputColumnName = nameof(EventsData.Value);
string outputColumnName = nameof(EventsPrediction.Prediction);

IDataView transformedData = machineLearningContext.Transforms
                    .DetectSpikeBySsa(outputColumnName, inputColumnName,
                                      confidence: aggregationDefinition.MinimumConfidenceForAnomalyDetection,
                                      trainingWindowSize: aggregationDefinition.TrainingWindowSizeForAnomalyDetection,
                                      seasonalityWindowSize: 30,
                                      pvalueHistoryLength: 30)
                    .Fit(dataView).Transform(dataView);

List<EventsPrediction> predictions = machineLearningContext.Data.CreateEnumerable<EventsPrediction>(transformedData, reuseRowObject: false).ToList();

Classes:

    public class EventsData
    {
        [LoadColumn(0)]
        public DateTime FromTime { get; set; }

        [LoadColumn(1)]
        public float Value { get; set; }
    }

    public class EventsPrediction
    {
        [VectorType(2)]
        public double[] Prediction { get; set; }
    }

Because of these wrong predictions, I get a lot of false positive spikes. What am I doing wrong?


Solution

  • Since there is no constant seasonality in the data - Setting the seasonalityWindowSize to the minimum (2) fixed this issue!