I have the following timeseries data frame extracted from a larger one.
df_test = df.loc[(df['time'] >= '2015-05-01') & (df['time'] <= '2015-05-09')]
df_test.set_index('time')
The head of the data look like this:
time total_consumption
122400 2015-05-01 00:01:00 106.391
122401 2015-05-01 00:11:00 120.371
122402 2015-05-01 00:21:00 109.292
122403 2015-05-01 00:31:00 99.838
122404 2015-05-01 00:41:00 97.387
Using SARIMAX, i obtained this model:
mod = sm.tsa.statespace.SARIMAX(np.asarray(df_test['total_consumption']),
order=(1,1,1),
seasonal_order=(0,1,1,12),
enforce_stationarity=False,
enforce_invertibility=False)
results_final = mod.fit()
I then tried to get the prediction based on the model:
start = pd.to_datetime('2015-05-08 00:01:00')
pred = results_final.get_prediction(start, dynamic=False)
pred_ci = pred.conf_int()
However, when I try to get a prediction for the end of my data frame with the get_prediction() command, i get this error message and can't seem to figure out why.
ValueError: Got a string for start and dates is None
Thank you
I guess the problems is that you do not use a time index. If you want to use dates, then the data needs to be a pandas Series with a date/time index.
Try after dropping the np.asarray
and use df_test['total_consumption']
directly when creating the model.
A numpy array does not have any date information, so dates cannot be used in specifying the forecast periods. in the case with numpy arrays, the forecast or predict periods need to be specified with the usual numpy integer indices.