I am tuning a gradient boosted classifier using a pipeline and grid search
My pipeline is
pipe = make_pipeline(StandardScaler(with_std=True, with_mean=True), \
RFE(RandomForestClassifier(), n_features_to_select= 15), \
GradientBoostingClassifier(random_state=42, verbose=True))
The parameter gri is:
tuned_parameters = [{'gradientboostingclassifier__max_depth': range(3, 5),\
'gradientboostingclassifier__min_samples_split': range(4,6),\
'gradientboostingclassifier__learning_rate':np.linspace(0.1, 1, 10)}]
The grid search is done as
grid = GridSearchCV(pipe, tuned_parameters, cv=5, scoring='accuracy', refit=True)
grid.fit(X_train, y_train)
After fitting the model in train data, when I check the grid.best_estimator
I can only find the 2 parameters(learning_rate and min_samples_split
)that I am fitting. I don't find the max_depth
parameter in the best estimator.
grid.best_estimator_.named_steps['gradientboostingclassifier'] =
GradientBoostingClassifier(learning_rate=0.9, min_samples_split=5,
random_state=42, verbose=True)
But, if I use the grid.cv_results
to find the best 'mean_test_score
' and find the corresponding parameters for that test score, then I can find the max_depth
in it.
inde = np.where(grid.cv_results_['mean_test_score'] == max(grid.cv_results_['mean_test_score']))
grid.cv_results_['params'][inde[-1][0]]
{'gradientboostingclas...rning_rate': 0.9, 'gradientboostingclas..._max_depth': 3, 'gradientboostingclas...ples_split': 5}
special variables
function variables
'gradientboostingclassifier__learning_rate':0.9
'gradientboostingclassifier__max_depth':3
'gradientboostingclassifier__min_samples_split':5
My doubt now is, if I use the trained pipeline (name of the object is 'grid' in my case) will it still use the 'max_depth
' parameter also or will it not?
Is it then better to use the 'best parameters
' which gave me the best 'mean_test_score
' taken from the grid.cv_results
Your pipeline has been tuned on all three parameters that you specified. It is just that the best value for max_depth
happens to be the default value. When printing the classifier, default values will not be included. Compare the following outputs:
print(GradientBoostingClassifier(max_depth=3)) # default
# output: GradientBoostingClassifier()
print(GradientBoostingClassifier(max_depth=5)) # not default
# output: GradientBoostingClassifier(max_depth=5)
In general, it is best-practice to access the best parameters by the best_params_
attribute of the fitted GridSearchCV
object since this will always include all parameters:
grid.best_params_