Search code examples
pythonmachine-learningscikit-learn

Choosing top k models using GridSearchCV in scikit-learn


Is there an easy/pre-existing way to perform a Grid Search in scikit-learn and then automatically return the top k best performing models or automatically average their outputs? I intend to try and reduce overfitting this way. I have not yet found anything related to this.

EDIT: To clarify, I know about sklearn's GridSearch, I am looking for an option to perform a Grid Search and then return the top k best performing models or average over them, rather than just returning the best single model.


Solution

  • If you have your fitted GridSearchCV object as grid, you can get the results for each parameter with grid.cv_results_. I usually load it as a pandas DataFrame.

    import pandas as pd
    results = pd.DataFrame(grid.cv_results_)
    results.sort_values(by='rank_test_score', inplace=True)
    

    Then you can get the parameters for each model from the params column. For example, If you want to set the 2nd best parameter:

    params_2nd_best = results.iloc[1]['params']
    clf_2nd_best = grid.best_estimator_.set_params(**params_2nd_best)