I'm using RandomForestClassifier in sklearn, and using GridsearchCV for getting best estimator.
I'm wondering when there are many estimators (from simple one to complex one) having the same scores in GridsearchCV, what will be the resulted estimator out of GridsearchCV? The simplest one? or random one?
GridSearchCV
does not assess the model complexity (though that would be a neat feature). Neither does it choose among the best models randomly.
Instead, GridSearchCV
simply performs an np.argmin()
on the stored errors. See the corresponding line in the source code.
Now, according to the NumPy docs,
In case of multiple occurrences of the minimum values, the indices corresponding to the first occurrence are returned.
That is, GridSearchCV
will always select the first among the best models.