I'm using ML.NET to detect spikes within a list of values (with corresponding dates).
As I understand it, the predictions output is a list of the following:
0.0, (Is spike?)
5.23, (Score = Actual value - predicted value)
0.20028081299669430182, (PValue)
Most of the times the spike detection works well, however in many other times i recieve incorrect predictions, as if the input was totaly different.
For example - in a specific scenario, the input value "2" gets the following prediction:
1.0,
11.739639282226563, (This is not reasonable since the score is higher than the value itself!)
0.00028081299669430182
Code:
MLContext machineLearningContext = new();
IDataView dataView = machineLearningContext.Data.LoadFromEnumerable(aggregationEvents);
string inputColumnName = nameof(EventsData.Value);
string outputColumnName = nameof(EventsPrediction.Prediction);
IDataView transformedData = machineLearningContext.Transforms
.DetectSpikeBySsa(outputColumnName, inputColumnName,
confidence: aggregationDefinition.MinimumConfidenceForAnomalyDetection,
trainingWindowSize: aggregationDefinition.TrainingWindowSizeForAnomalyDetection,
seasonalityWindowSize: 30,
pvalueHistoryLength: 30)
.Fit(dataView).Transform(dataView);
List<EventsPrediction> predictions = machineLearningContext.Data.CreateEnumerable<EventsPrediction>(transformedData, reuseRowObject: false).ToList();
Classes:
public class EventsData
{
[LoadColumn(0)]
public DateTime FromTime { get; set; }
[LoadColumn(1)]
public float Value { get; set; }
}
public class EventsPrediction
{
[VectorType(2)]
public double[] Prediction { get; set; }
}
Because of these wrong predictions, I get a lot of false positive spikes. What am I doing wrong?
Since there is no constant seasonality in the data - Setting the seasonalityWindowSize to the minimum (2) fixed this issue!