I did a grid search on a logistic regression and set scoring to 'roc_auc'. The grid_clf1.best_score_ gave me an auc of 0.7557. After that I wanted to plot the ROC curve of the best model. The ROC curve I saw had an AUC of 0.50 I do not understand this at all.
I looked into the predicted probabilites and I saw that they were all 0.0 or 1.0. Hence, I think something went wrong here but I cannot find what it is.
My code is as follows for the grid search cv:
clf1 = Pipeline([('RS', RobustScaler()), ('LR',
LogisticRegression(random_state=1, solver='saga'))])
params = {'LR__C': np.logspace(-3, 0, 5),
'LR__penalty': ['l1']}
grid_clf1 = GridSearchCV(clf1, params, scoring='roc_auc', cv = 5,
grid_clf1.fit(X_train, y_train)
So this gave an AUC of 0.7557 for the best model. Then if I calculate the AUC for the model myself:
y_pred_proba = grid_clf1.best_estimator_.predict_probas(X_test)[::,1]
print(roc_auc_score(y_test, y_pred_proba))
This gave me an AUC of 0.50.
It looks like there are two problems with your example code:
is calledroc_auc_score
function call. It can be expanded to np.mean(cross_val_score(...))
So, if take that into account you will get the same scoring values. You can use the colab notebook as a reference.