Search code examples
pythonscikit-learncross-validationk-fold

Does GridSearchCV return the best_estimator_ after fitting?


Let's say we tune an SVM with GridSearch like this:

algorithm = SVM()
parameters = {'kernel': ['rbf', 'sigmoid'], 'C': [0.1, 1, 10]}

grid= GridSearchCV(algorithm, parameters)
grid.fit(X, y)

You then wish to use the best fit parameters/estimator in a cross_val_score. My question is, which model is grid at this point? Is it the best performing one? In other words, can we just do

cross_val_scores = cross_val_score(grid, X=X, y=y)

or should we use

cross_val_scores = cross_val_score(grid.best_estimator_, X=X, y=y)

When I run both, I find that they do not return the same scores so I am curious what the correct approach is here. (I would assume using the best_estimator_.) That raises another question, though, namely: what does using just grid use as a model then? The first one?


Solution

  • You don't need cross_val_score after fitting a GridSearchCV. It already has attributes that allow you to access cross validation scores. cv_results_ gives you all. You can index into this with the best_index attribute if you want to see only that specific estimator's results.

    cv_results = pd.DataFrame(grid.cv_results_)
    cv_results.iloc[grid.best_index_]
    mean_fit_time                        0.00046916
    std_fit_time                         1.3785e-05
    mean_score_time                     0.000251055
    std_score_time                      1.19038e-05
    param_C                                      10
    param_kernel                                rbf
    params               {'C': 10, 'kernel': 'rbf'}
    split0_test_score                      0.966667
    split1_test_score                             1
    split2_test_score                      0.966667
    split3_test_score                      0.966667
    split4_test_score                             1
    mean_test_score                            0.98
    std_test_score                        0.0163299
    rank_test_score                               1
    Name: 5, dtype: object
    

    Most of the methods you call on a fitted GridSearchCV use the best model (grid.predict(...) gets you the predictions for the best model, for example). This is not true for the estimator. The difference you see is probably comes from that. cross_val_score fits it again, but this time makes the scoring against grid.estimator but not grid.best_estimator_.