How to replicate GridSearchCV result?

Using GridSearchCV, I try to maximize AUC for a LogisticRegression Classifier

clf_log = LogisticRegression(C=1, random_state=0).fit(X_train, y_train)

from sklearn.model_selection import GridSearchCV

grid_params = {'penalty': ['l1','l2'], 'C': [0.001,0.01,0.1,1,10,100], 'max_iter' : [100]} 
gs = GridSearchCV(clf_log, grid_params, scoring='roc_auc', cv=5), y_train)`

I got gs.best_score_ of 0.7630647186779661 with gs.best_estimator_ and gs.best_params_, respectively as follow:

<< LogisticRegression(C=10, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=0, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False) >>

{'C': 10, 'max_iter': 100, 'penalty': 'l2'}

However, when I reintroduced these params into my original clf_log, I only got AUC of 0.5359918677005525. What am I missing (I think: CV part)? How can I get and replicate the same results? Thanks!


  • Grid Search CV uses K fold cross validation, i.e. when you use the fit method, it divides the data into test and train sets (cv=5 means test set is 1/5 of the dataset) and this is done cv times (5 in this case). So you shouldn't be using X_train and y_train, instead use X and y (assuming you don't want a third validation set) as the splitting gets done internally., y)

    After this let's say your best parameters are {'C': 10, 'max_iter': 100, 'penalty': 'l2'}. Now say you want to apply this. If want to replicate the output of your GridSearchCV, then you need to use k fold cross validation again (If you use train_test_split instead, your results will slightly vary).

    from sklearn.model_selection import cross_val_score
    np.average(cross_val_score(LogisticRegression(C=10, max_iter=100, penalty='l2'), X, y, scoring='roc_auc', cv=10))

    With this you should be getting the same AUC. You can refer this video