Search code examples
pythonscikit-learngrid-searchgridsearchcv

If GridSearchCV gives a few estimators with rank 1, which one will it pick as the best estimator?


Using Scikit-learn's GridSearchCV, if GridSearchCV gives a few estimators with rank 1, which one will it pick as the best estimator best_estimator_? Will it pick the first estimator in the list that appears in cv_results_?

Unless if I am going blind, I can't seem to find this in the documentation here: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

Many thanks in advance.


Solution

  • Yes, it (currently) picks the first one in cv_results_. From the source, it just takes argmin, which according to the numpy docs picks the first index in case of ties.

    (It doesn't seem like there's any reason to prefer that, so it seems relatively likely to change. In particular, implementing a tie-breaker using lowest standard deviation or train score or time or ... seems worthwhile.)

    As a quick experiment, use a meaningless (for performance) hyperparameter:

    search = GridSearchCV(estimator=LogisticRegression(),
                          param_grid={'verbose': [0, 1, 2]})
    search.fit(X, y)
    print(search.cv_results_, search.best_params_)