Search code examples
amazon-web-servicesforecastamazon-forecast

AWS Forecast cannot train the predictor due to missing data


This question is close, but doesn't quite help me with a similar issue as I am using a single data set and no related time series.

I am using AWS Forecast with a single time series dataset (no related data, just the main DS). It is a daily data set with about 10 years of data ranging from 2010-2020.

I have 3572 data points in the original data set; I manually filled missing data to ensure there were no missing days in the date range for a total of 3739 data points. I lopped off everything in 2020 to create a validation dataset and then configured the predictor for a 180 day Forecast. I keep getting the following error:

Unable to evaluate this dataset because there is missing data in the evaluation window for all items. Ensure that there is complete data for at least one item in the evaluation window starting from 2019-03-07T00:00:00 up to 2020-01-01T00:00.

There is definitely no missing data, I've double and triple checked the date range and data fill and every day between start and end dates has a data point. I also tried adding a data point for 1/1/2020 (it ended at 12/31/2019) and I continue to get this error. I can't figure out what it's asking me for, except that maybe I'm missing something in my math about the forecast Horizon and Backtest window offset?

Dataset example: enter image description here

Brief model parameters (can share more if I'm missing something pertinent):

Total data points in training data: 3479
forecastHorizon = 180
create_predictor_response=forecast.create_predictor(PredictorName=predictorName, 
                                                  ForecastHorizon=forecastHorizon,
                                                  PerformAutoML= True,
                                                  PerformHPO=False,
                                                  EvaluationParameters= {"NumberOfBacktestWindows": 1, 
                                                                         "BackTestWindowOffset": 180}, 
                                                  InputDataConfig= {"DatasetGroupArn": datasetGroupArn},
                                                  FeaturizationConfig= {"ForecastFrequency": 'D'

Solution

  • I noticed you don't have entry for 6/24/10 (this american date format is the worst btw)

    I faced a similar problem when leaving out days (assuming you're modelling in daily frequency) just like that and having the Forecast automatic filling of gaps to nan values (as opposed to zero which is the default). I suggest you:

    • pre-fill literally every date within the range of training data (and of forecast window, if using related data)
    • choose zero as the option for automatically filling of missing values. I think mean or any other float value would also work for that matter

    let me know if that works! I am also using Forecast and it's good to keep track of possible problems and solutions