Let's say that I create a RandomizedSearchCV
like so:
searcher = model_selection.RandomizedSearchCV(estimator = RandomForestClassifier(),
param_distributions = random_grid,
n_iter = 20, # Number of parameter combinations to try
cv = 3, # Number of folds for k-fold validation
n_jobs = -1) # Use all processors to compute in parallel
search = searcher.fit(x_train, y_train)
search.best_params_
n_iter
tells us how many combinations the search will test. To me, it would be pretty important to know that as part of or in addition to the 20 combinations, that the default model parameters are included. Does anyone know if this is true or not?
They are not (and arguably, it would be strange if this was the case).
The detailed values of the parameter combinations tried are returned in the attribute cv_results_
of the fitted RandomizedSearchCV
object. Adapting the example from the docs (using the default n_iter = 10
), we get:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform
iris = load_iris()
logistic = LogisticRegression(solver='saga', tol=1e-2, max_iter=200,
random_state=0)
distributions = dict(C=uniform(loc=0, scale=4),
penalty=['l2', 'l1'])
clf = RandomizedSearchCV(logistic, distributions, random_state=0)
search = clf.fit(iris.data, iris.target)
search.cv_results_
You can directly inspect the dictionary returned by search.cv_results_
, or you can import it into a pandas dataframe for a more compact representation:
import pandas as pd
df = pd.DataFrame.from_dict(search.cv_results_)
df['params']
# result:
0 {'C': 2.195254015709299, 'penalty': 'l1'}
1 {'C': 3.3770629943240693, 'penalty': 'l1'}
2 {'C': 2.1795327319875875, 'penalty': 'l1'}
3 {'C': 2.4942547871438894, 'penalty': 'l2'}
4 {'C': 1.75034884505077, 'penalty': 'l2'}
5 {'C': 0.22685190926977272, 'penalty': 'l2'}
6 {'C': 1.5337660753031108, 'penalty': 'l2'}
7 {'C': 3.2486749151019727, 'penalty': 'l2'}
8 {'C': 2.2721782443757292, 'penalty': 'l1'}
9 {'C': 3.34431505414951, 'penalty': 'l2'}
from where it is clear that the default value of C=1.0
for LogisticRegression
was not included in the search grid.
If you have any reason to assess the performance of the model with its default parameters, you should do it separately - arguably it is pretty straightforward (just 2 lines of code).