Is there an easy/pre-existing way to perform a Grid Search in scikit-learn and then automatically return the top k best performing models or automatically average their outputs? I intend to try and reduce overfitting this way. I have not yet found anything related to this.
EDIT: To clarify, I know about sklearn's GridSearch, I am looking for an option to perform a Grid Search and then return the top k best performing models or average over them, rather than just returning the best single model.
If you have your fitted GridSearchCV
object as grid
, you can get the results for each parameter with grid.cv_results_
. I usually load it as a pandas DataFrame.
import pandas as pd
results = pd.DataFrame(grid.cv_results_)
results.sort_values(by='rank_test_score', inplace=True)
Then you can get the parameters for each model from the params
column. For example, If you want to set the 2nd best parameter:
params_2nd_best = results.iloc[1]['params']
clf_2nd_best = grid.best_estimator_.set_params(**params_2nd_best)