Search code examples

Cross-validation for time-series data: Convert user-defined list of tuples with inner lists of lists to list of tuples for applying in GridSearchCV

I have time series data and want to do walk forward cross-validation for my ML model in Python. To create splits I have done following:

cv_split = [(list_of_lists[:i], list_of_lists[i:i+1]) for i in range(1, len(list_of_lists))] 

(where the list_of_lists is e.g.:[[0,1,2],[3,4],[5,6,7,8,], ...]
where each list stand for observations in a particular year.

The result for cv_split is the list of tuples with inner list of lists, each tuple is: ([[0,1,2],[3,4]], [[5,6,7,8]]),
and this is the problem because GridSearchCV does not accept this.

I know that the following form for my cv_split will work:
([0,1,2,3,4], [5,6,7,8]) (list of tuples of lists).
Well I struggle how to come from ([[0,1,2],[3,4]], [[5,6,7,8]]) to ([0,1,2,3,4], [5,6,7,8])?

Here more comprehensive:

Now I have:

[([[0,1,2],[3,4]], [[5,6,7,8]]) 




And I need the following form:

[([0,1,2,3,4], [5,6,7,8]) 




I am new to Python and will be happy about any help with some explanation.


  • Here is how you can use a nested list comprehension:

    lst = ([[0,1,2],[3,4]], [[5,6,7,8]])
    t = tuple([[a for b in l for a in b] for l in lst])


    ([0, 1, 2, 3, 4], [5, 6, 7, 8])


    lst = [([[0,1,2],[3,4]], [[5,6,7,8]]),
           ([[0,1,2],[3,4],   [5,6,7,8]],[[9,10]]),
           ([[0,1,2],[3,4],   [5,6,7,8],  [9,10]],[[11,12,13]]),
           ([[0,1,2],[3,4],   [5,6,7,8],  [9,10],  [11,12,13]],[[14,15,16]])]
    ls = [tuple([[a for b in l for a in b] for l in tt]) for tt in lst]