I try to use cross-validation
with a timeseries for a pandas dataframe with the sktime
TimeSeriesSplit
. The dataframe df
has a daily format:
timepoint balance
0 2017-03-01 1.0
1 2017-04-01 0.0
2 2017-05-01 2.0
3 2017-06-01 3.0
4 2017-07-01 0.0
...
I try to use prophet
and run the following code:
#Packages
from sktime.forecasting.fbprophet import Prophet
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import mean_squared_error
import numpy as np
#preperation
tscv = TimeSeriesSplit()
rmse = []
model_ph = Prophet()
#function
for train_index, test_index in tscv.split((df)):
cv_train, cv_test = df.iloc[train_index], df.iloc[test_index]
ph= model_ph.fit(cv_train)
predictions = model_ph.predict(cv_test.index.values[0], cv_test.index.values[-1])
true_values = cv_test.values
rmse.append(sqrt(mean_squared_error(true_values, predictions)))
#print
print("RMSE: {}".format(np.mean(rmse)))
which leads to the following error:
TypeError: X must be either None, or in an sktime compatible format, of scitype
Series, Panel or Hierarchical, for instance a pandas.DataFrame with sktime
compatible time indices...
I would have expected outputs for the mean_squared_error
The problems occurs, as sktime prophet only always specific input data. In my case the solution was to create a pd.date_range
as input for the prediction:
for train_index, test_index in tscv.split((df)):
cv_train, cv_test = df.iloc[train_index], df.iloc[test_index]
ph= model_ph.fit(cv_train)
forecast = model_ph.predict(fh= pd.date_range(cv_test['timepoint'].values[0], periods=len(cv_test), freq='D'))
predictions= forecast['balance'].values
true_values = cv_test['balance'].values
rmse.append(sqrt(mean_squared_error(true_values, predictions)))