Search code examples
pythonscikit-learncross-validation

does RandomizedSearchCV automatically include the default model parameters that are passed to the constructor?


Let's say that I create a RandomizedSearchCV like so:

searcher = model_selection.RandomizedSearchCV(estimator = RandomForestClassifier(),
                                            param_distributions = random_grid,
                                            n_iter = 20, # Number of parameter combinations to try
                                            cv     = 3,  # Number of folds for k-fold validation 
                                            n_jobs = -1) # Use all processors to compute in parallel
search = searcher.fit(x_train, y_train)
search.best_params_

n_iter tells us how many combinations the search will test. To me, it would be pretty important to know that as part of or in addition to the 20 combinations, that the default model parameters are included. Does anyone know if this is true or not?


Solution

  • They are not (and arguably, it would be strange if this was the case).

    The detailed values of the parameter combinations tried are returned in the attribute cv_results_ of the fitted RandomizedSearchCV object. Adapting the example from the docs (using the default n_iter = 10), we get:

    from sklearn.datasets import load_iris
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import RandomizedSearchCV
    from scipy.stats import uniform
    
    iris = load_iris()
    logistic = LogisticRegression(solver='saga', tol=1e-2, max_iter=200,
                                   random_state=0)
    distributions = dict(C=uniform(loc=0, scale=4),
                          penalty=['l2', 'l1'])
    clf = RandomizedSearchCV(logistic, distributions, random_state=0)
    
    search = clf.fit(iris.data, iris.target)
    search.cv_results_
    

    You can directly inspect the dictionary returned by search.cv_results_, or you can import it into a pandas dataframe for a more compact representation:

    import pandas as pd
    df = pd.DataFrame.from_dict(search.cv_results_)
    df['params']
    # result:
    0      {'C': 2.195254015709299, 'penalty': 'l1'}
    1     {'C': 3.3770629943240693, 'penalty': 'l1'}
    2     {'C': 2.1795327319875875, 'penalty': 'l1'}
    3     {'C': 2.4942547871438894, 'penalty': 'l2'}
    4       {'C': 1.75034884505077, 'penalty': 'l2'}
    5    {'C': 0.22685190926977272, 'penalty': 'l2'}
    6     {'C': 1.5337660753031108, 'penalty': 'l2'}
    7     {'C': 3.2486749151019727, 'penalty': 'l2'}
    8     {'C': 2.2721782443757292, 'penalty': 'l1'}
    9       {'C': 3.34431505414951, 'penalty': 'l2'}
    

    from where it is clear that the default value of C=1.0 for LogisticRegression was not included in the search grid.

    If you have any reason to assess the performance of the model with its default parameters, you should do it separately - arguably it is pretty straightforward (just 2 lines of code).