I have some code that would use TimeSeriesSplit
to split my data. For each split, I would use ParametersGrid
and loop through each parameter combination, record the best set of params and use it to predict my X_test
. You can see the code for this part at the bottom of the post
I understand that GridSearchCV
will do a lot of that work for me. I'm wondering if I use the following code, where does my data get split into
X_train
, X_test
, y_train
and y_test
? Will using the GridSearchCV
with the TimeSeriesSplit
handle this behind the scenes and if so will this code accomplish the same thing as my original code at the bottom of this post? Also, i've now tried the GridSearchCV
method and it's been almost 30 min without finishing - do i have the right syntax?
X = data.iloc[:, 0:8]
y = data.iloc[:, 8:9]
parameters = [
{'kernel': ['rbf'],
'gamma': [.01],
'C': [1, 10, 100]}]
gsc = GridSearchCV(SVR(), param_grid=parameters, scoring='neg_mean_absolute_error',
cv=TimeSeriesSplit(n_splits=2))
gsc.fit(X,y)
means = gsc.cv_results_['mean_test_score']
for mean in means:
print(mean)
print('end')
Original Code Below:
# Create the time series split generator
tscv = TimeSeriesSplit(n_splits=3)
for train_index, test_index in tqdm(tscv.split(X)):
X_train, X_test = X.iloc[train_index], X.iloc[test_index]
y_train, y_test = y.iloc[train_index], y.iloc[test_index]
# scale the data set
scaler_X = StandardScaler()
scaler_y = StandardScaler()
scaler_X.fit(X_train)
scaler_y.fit(y_train)
X_train, X_test = scaler_X.transform(X_train), scaler_X.transform(X_test)
y_train, y_test = scaler_y.transform(y_train), scaler_y.transform(y_test)
# optimization area - set params
parameters = [
{'kernel': ['rbf'],
'gamma': [.01],
'C': [ 1,10,100,500,1000]}]
regressor = SVR()
# loop through each of the parameters and find the best set
for e, g in enumerate(ParameterGrid(parameters)):
regressor.set_params(**g)
regressor.fit(X_train, y_train.ravel())
score = metrics.mean_absolute_error(regressor.predict(X_train), y_train.ravel())
if e == 0:
best_score = score
best_params = g
elif score < best_score:
best_score = score
best_params = g
# refit the model with the best set of params
regressor.set_params(**best_params)
regressor.fit(X_train, y_train.ravel())
You need to modify the code slightly.
gsc = GridSearchCV(SVR(), param_grid=parameters, scoring='neg_mean_absolute_error',
cv=TimeSeriesSplit(n_splits=2).split(X))
And, you can consider adding verbose
parameter to look at running output.