Search code examples
pythonlisttime-seriestuplescross-validation

Cross-validation for time-series data: Convert user-defined list of tuples with inner lists of lists to list of tuples for applying in GridSearchCV


I have time series data and want to do walk forward cross-validation for my ML model in Python. To create splits I have done following:

cv_split = [(list_of_lists[:i], list_of_lists[i:i+1]) for i in range(1, len(list_of_lists))] 

(where the list_of_lists is e.g.:[[0,1,2],[3,4],[5,6,7,8,], ...]
where each list stand for observations in a particular year.

The result for cv_split is the list of tuples with inner list of lists, each tuple is: ([[0,1,2],[3,4]], [[5,6,7,8]]),
and this is the problem because GridSearchCV does not accept this.

I know that the following form for my cv_split will work:
([0,1,2,3,4], [5,6,7,8]) (list of tuples of lists).
Well I struggle how to come from ([[0,1,2],[3,4]], [[5,6,7,8]]) to ([0,1,2,3,4], [5,6,7,8])?

Here more comprehensive:

Now I have:

[([[0,1,2],[3,4]], [[5,6,7,8]]) 

([[0,1,2],[3,4],[5,6,7,8]],[[9,10]])

([[0,1,2],[3,4],[5,6,7,8],[9,10]],[[11,12,13]]) 

([[0,1,2],[3,4],[5,6,7,8],[9,10],[11,12,13]],[[14,15,16]])] 

And I need the following form:

[([0,1,2,3,4], [5,6,7,8]) 

([0,1,2,3,4,5,6,7,8],[9,10]) 

([0,1,2,3,4,5,6,7,8,9,10],[11,12,13]) 

([0,1,2,3,4,5,6,7,8,9,10,11,12,13],[14,15,16])]

I am new to Python and will be happy about any help with some explanation.


Solution

  • Here is how you can use a nested list comprehension:

    lst = ([[0,1,2],[3,4]], [[5,6,7,8]])
    
    t = tuple([[a for b in l for a in b] for l in lst])
    
    print(t)
    

    Output:

    ([0, 1, 2, 3, 4], [5, 6, 7, 8])
    

    UPDATE:

    lst = [([[0,1,2],[3,4]], [[5,6,7,8]]),
           ([[0,1,2],[3,4],   [5,6,7,8]],[[9,10]]),
           ([[0,1,2],[3,4],   [5,6,7,8],  [9,10]],[[11,12,13]]),
           ([[0,1,2],[3,4],   [5,6,7,8],  [9,10],  [11,12,13]],[[14,15,16]])]
    
    ls = [tuple([[a for b in l for a in b] for l in tt]) for tt in lst]