Sklearn's TimeSeriesSplit is a useful way to implement a time series equivalent of kfold cross validation. It appears that however it only has support for a single-step horizon and no multi-step horizons e.g. it from a dataset of [1, 2, 3, 4]
it can be used to create the following train and test sets respectively
[1, 2], [3]
[1, 2, 3], [4]
[1, 2, 3, 4], [5].
What is could not produce is something with a multi-step forecast horizon. A multi-step time series split forecasting horizon would look like
[1, 2,], [3, 4]
[1, 2, 3], [4, 5]
[1, 2, 3, 4], [5, 6],
for example.
I would like to know if there is a good reason for this? I am able to implement my own version of TimeSeriesSplit such that this isn't a problem, however I am new to the field of forecasting. It is my understanding that using such a procedure is statistically the best way to measure the accuracy of a model. I find it then curious that sklearn does not provide this functionality out of the box and would like to know if there is a reason why and if I have overlooked any reason as to why having a multi-step forecast horizon as shown above means my method of statistical accuracy evaluation should change?
There is a reason, however it is not a "good" one. Most established forecasting methods train a model on one step-ahead errors, since for multistep forecasting, they will be doing recursive forecasting instead of direct forecasting any way (i.e, for most forecasting methods, there is no use for a multi-time series split).
I suspect that's why the sklearn-authors didn't bother.
If you want to use R instead of Python, the tsCV() function performs time series splits of the type,
[1, 2,], [4]
[1, 2, 3], [5]
[1, 2, 3, 4], [6]
However, tsCV doesn't return the time series split itself, instead it takes a time series + forecast model as input, and returns CV based error matrix.
I don't know if it does it exactly the way you want it or not though.