I am using XGBClassifier for the Rain in Australia dataset and trying to predict whether it will rain today or not. I wanted to tune the hyperparameters of the classifier with GridSearch and score it with ROC_AUC. Here is my code:
param_grid = {
"max_depth": [3, 4, 5, 7],
"gamma": [0, 0.25, 1],
"reg_lambda": [0, 1, 10],
"scale_pos_weight": [1, 3, 5],
"subsample": [0.8], # Fix subsample
"colsample_bytree": [0.5], # Fix colsample_bytree
}
from sklearn.model_selection import GridSearchCV
# Init the classifier
xgb_cl = xgb.XGBClassifier(objective="binary:logistic", verbose=0)
# Init the estimator
grid_cv = GridSearchCV(xgb_cl, param_grid, scoring="roc_auc", n_jobs=-1)
# Fit
_ = grid_cv.fit(X, y)
When the search is finally done, I am getting the best score with .best_score_
but somehow only getting an accuracy score instead of ROC_AUC. I thought this was only the case with GridSearch, so I tried HalvingGridSearchCV
and cross_val_score
with scoring
set to roc_auc
but I got accuracy score for them too. I checked this by manually computing ROC_AUC with sklearn.metrics.roc_auc_score
.
Is there anything I am doing wrong or what is the reason for this behavior?
Have you tried your own roc_auc scoring rule? It seems like you are passing labels instead of probabilities (you originally need) for roc_auc.
problem described in here: Different result roc_auc_score and plot_roc_curve
Solutions for own scorers: Grid-Search finding Parameters for AUC
Update2
Sorry, saw today that my introduction text from the notebook was missing lol
When calculating roc_auc_score you have the option (it doesnt matter, if it is with or without gridsearch, with or without pipeline) that you can pass it labels like (0/1) or probabilities like (0.995, 0.6655). The first should be easy available if you just convert your probas to labels. However that would result in a (straight reversed L) output plot. That looks sometimes ugly. the other option is to use predicted probabilites to pass them them to the roc_auc_score. That would result in a (staircase reversed L) output plot, which looks much better. So what you first should test is, can you get a roc auc score with labels, with and without grid, if that is the case. You should then try to get probabilities. And there, I believe, you have to write your own scoring method, as the roc-auc_score in grid only serves labels, that would result in high roc_auc scores. I wrote something for you, so you can see the label approach:
import xgboost as xgb
from sklearn.metrics import confusion_matrix
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import roc_auc_score
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target
xgb_model = xgb.XGBClassifier(objective="binary:logistic",
eval_metric="auc",
use_label_encoder=False,
colsample_bytree = 0.3,
learning_rate = 0.1,
max_depth = 5,
gamma = 10,
n_estimators = 10,
verbosity=None)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
xgb_model.fit(X_train, y_train)
preds = xgb_model.predict(X_test)
print(confusion_matrix(preds, y_test))
print ('ROC AUC Score',roc_auc_score(y_test,preds))
Gives:
[[51 2]
[ 3 87]]
ROC AUC Score 0.9609862671660424
Here you can see it is ridicoulous high.
If you wanna do it with grid: get rid of this:
# Fit
_ = grid_cv.fit(X, y)
just grid_cv.fit(x, y)
fit is a method applied to grid_cv and results are stored within grid_cv
print(grid_cv.best_score_)
should deliver the auc as you already have defined it.
See also: different roc_auc with XGBoost gridsearch scoring='roc_auc' and roc_auc_score?
But this should also be ridicoulos high, as you will be probably serving labels instead of probas.
beware also of: What is the difference between cross_val_score with scoring='roc_auc' and roc_auc_score?
And nobody hinders you to apply the roc-auc_score function to your grid_results...