Search code examples
machine-learningscikit-learnregressiongrid-search

Using GridSearchCV with TimeSeriesSplit


I have some code that would use TimeSeriesSplit to split my data. For each split, I would use ParametersGrid and loop through each parameter combination, record the best set of params and use it to predict my X_test. You can see the code for this part at the bottom of the post

I understand that GridSearchCV will do a lot of that work for me. I'm wondering if I use the following code, where does my data get split into X_train, X_test, y_train and y_test? Will using the GridSearchCV with the TimeSeriesSplit handle this behind the scenes and if so will this code accomplish the same thing as my original code at the bottom of this post? Also, i've now tried the GridSearchCV method and it's been almost 30 min without finishing - do i have the right syntax?

X = data.iloc[:, 0:8]
y = data.iloc[:, 8:9]

parameters = [
    {'kernel': ['rbf'],
     'gamma': [.01],
     'C': [1, 10, 100]}]

gsc = GridSearchCV(SVR(), param_grid=parameters, scoring='neg_mean_absolute_error', 
                   cv=TimeSeriesSplit(n_splits=2))
gsc.fit(X,y)
means = gsc.cv_results_['mean_test_score']
for mean in means:
    print(mean)
print('end')

Original Code Below:

# Create the time series split generator
tscv = TimeSeriesSplit(n_splits=3)

for train_index, test_index in tqdm(tscv.split(X)):

X_train, X_test = X.iloc[train_index], X.iloc[test_index]
y_train, y_test = y.iloc[train_index], y.iloc[test_index]

# scale the data set
scaler_X = StandardScaler()
scaler_y = StandardScaler()
scaler_X.fit(X_train)
scaler_y.fit(y_train)
X_train, X_test = scaler_X.transform(X_train), scaler_X.transform(X_test)
y_train, y_test = scaler_y.transform(y_train), scaler_y.transform(y_test)


# optimization area - set params
parameters = [
    {'kernel': ['rbf'],
     'gamma': [.01],
     'C': [ 1,10,100,500,1000]}]


regressor = SVR()
# loop through each of the parameters and find the best set
for e, g in enumerate(ParameterGrid(parameters)):
    regressor.set_params(**g)
    regressor.fit(X_train, y_train.ravel())
    score = metrics.mean_absolute_error(regressor.predict(X_train), y_train.ravel())
    if e == 0:
        best_score = score
        best_params = g
    elif score < best_score:
        best_score = score
        best_params = g


# refit the model with the best set of params

regressor.set_params(**best_params)
regressor.fit(X_train, y_train.ravel())

Solution

  • You need to modify the code slightly.

    gsc = GridSearchCV(SVR(), param_grid=parameters, scoring='neg_mean_absolute_error', 
                       cv=TimeSeriesSplit(n_splits=2).split(X))
    

    And, you can consider adding verbose parameter to look at running output.