I have time series data and want to do walk forward cross-validation for my ML model in Python. To create splits I have done following:
cv_split = [(list_of_lists[:i], list_of_lists[i:i+1]) for i in range(1, len(list_of_lists))]
(where the list_of_lists
is e.g.:[[0,1,2],[3,4],[5,6,7,8,], ...]
where each list stand for observations in a particular year.
The result for cv_split
is the list of tuples with inner list of lists, each tuple is: ([[0,1,2],[3,4]], [[5,6,7,8]])
,
and this is the problem because GridSearchCV does not accept this.
I know that the following form for my cv_split
will work:
([0,1,2,3,4], [5,6,7,8]) (list of tuples of lists)
.
Well I struggle how to come from ([[0,1,2],[3,4]], [[5,6,7,8]])
to ([0,1,2,3,4], [5,6,7,8])
?
Here more comprehensive:
Now I have:
[([[0,1,2],[3,4]], [[5,6,7,8]])
([[0,1,2],[3,4],[5,6,7,8]],[[9,10]])
([[0,1,2],[3,4],[5,6,7,8],[9,10]],[[11,12,13]])
([[0,1,2],[3,4],[5,6,7,8],[9,10],[11,12,13]],[[14,15,16]])]
And I need the following form:
[([0,1,2,3,4], [5,6,7,8])
([0,1,2,3,4,5,6,7,8],[9,10])
([0,1,2,3,4,5,6,7,8,9,10],[11,12,13])
([0,1,2,3,4,5,6,7,8,9,10,11,12,13],[14,15,16])]
I am new to Python and will be happy about any help with some explanation.
Here is how you can use a nested list comprehension:
lst = ([[0,1,2],[3,4]], [[5,6,7,8]])
t = tuple([[a for b in l for a in b] for l in lst])
print(t)
Output:
([0, 1, 2, 3, 4], [5, 6, 7, 8])
UPDATE:
lst = [([[0,1,2],[3,4]], [[5,6,7,8]]),
([[0,1,2],[3,4], [5,6,7,8]],[[9,10]]),
([[0,1,2],[3,4], [5,6,7,8], [9,10]],[[11,12,13]]),
([[0,1,2],[3,4], [5,6,7,8], [9,10], [11,12,13]],[[14,15,16]])]
ls = [tuple([[a for b in l for a in b] for l in tt]) for tt in lst]