I am struggling to work out how to implement TimeSeriesSplit in sklearn.
The suggested answer at the link below yields the same ValueError.
sklearn TimeSeriesSplit cross_val_predict only works for partitions
here the relevant bit from my code:
from sklearn.model_selection import cross_val_predict
from sklearn import svm
features = df[df.columns[0:6]]
target = df['target']
clf = svm.SVC(random_state=0)
pred = cross_val_predict(clf, features, target, cv=TimeSeriesSplit(n_splits=5).split(features))
ValueError Traceback (most recent call last)
<ipython-input-57-d1393cd05640> in <module>()
----> 1 pred = cross_val_predict(clf, features, target, cv=TimeSeriesSplit(n_splits=5).split(features))
/home/jedwards/anaconda3/envs/py36/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in cross_val_predict(estimator, X, y, groups, cv, n_jobs, verbose, fit_params, pre_dispatch, method)
408 if not _check_is_permutation(test_indices, _num_samples(X)):
--> 409 raise ValueError('cross_val_predict only works for partitions')
411 inv_test_indices = np.empty(len(test_indices), dtype=int)
ValueError: cross_val_predict only works for partitions
cross_val_predict cannot work with a TimeSeriesSplit as the first partition of the TimeSeriesSplit is never a part of the test dataset, meaning there are no predictions made for it.
e.g. when your dataset is [1, 2, 3, 4, 5]
in none of the folds is 1 in the test set
If you want to have the predictions on 2-5, you can manually loop through the splits generated by your CV and store the predictions for 2-5 yourself.