While optimizing parameters for xgboost I encountered a problem with the roc_auc_score metric. I get significantly different results during cross-validation compared to the results on the training data.
class OptunaHyperparamsSearch:
def __init__(self, X_train, y_train, **kwargs):
...
def objective(self, trial):
...
cv_results = xgb.cv(param, self.dtrain, num_boost_round=5, metrics=['auc'], nfold=5, verbose_eval=True)
mean_auc = cv_results['test-auc-mean'].max()
boost_rounds = cv_results['test-auc-mean'].idxmax()
param['n_estimators'] = boost_rounds
trial.set_user_attr('param', param)
print('boost_rounds: ', boost_rounds)
print('train-auc-mean', cv_results['train-auc-mean'][boost_rounds])
return mean_auc
def best_model(self, n_trials=100, save_path=None):
study = optuna.create_study(direction="maximize")
study.optimize(self.objective, n_trials=n_trials)
best_params = study.best_trial.user_attrs['param']
best_model = xgb.XGBClassifier(**best_params)
best_model.fit(self.X_train, self.y_train)
return best_model
After running code:
search = OptunaHyperparamsSearch(X_train, y_train)
model = search.best_model(n_trials=1)
I received:
[0] train-auc:0.777869+0.00962852 test-auc:0.771169+0.025347
[1] train-auc:0.786905+0.00865646 test-auc:0.777492+0.0255523
[2] train-auc:0.793305+0.00480249 test-auc:0.785307+0.0198732
[3] train-auc:0.79595+0.00349561 test-auc:0.789897+0.0158569
[4] train-auc:0.796818+0.00407504 test-auc:0.789997+0.016069
boost_rounds: 4
train-auc-mean 0.796818
[I 2020-06-04 10:12:25,093] Finished trial#0 with value: 0.7899968 with parameters: {'booster': 'dart', 'reg_lambda': 0.8001057111479173, 'reg_alpha': 0.0016960618598770582, 'max_depth': 8, 'min_child_weight': 4, 'learning_rate': 0.0602235073221647, 'gamma': 0.0011248451567255984, 'colsample_bytree': 0.911487203002922, 'subsample': 0.9057485217255851, 'grow_policy': 'lossguide', 'scale_pos_weight': 0.5865962792358733, 'sample_type': 'weighted', 'normalize_type': 'tree', 'rate_drop': 0.0009459988874640169, 'skip_drop': 8.103200442539776e-05}. Best is trial#0 with value: 0.7899968.
So the result is about 0.8 (train-auc-mean 0.796818). And after that running:
y_pred = model.predict(X_train)
print(roc_auc_score(y_train, y_pred))
I received:
0.598231710442728
So it's impossible. I tried also use customize function:
from sklearn.metrics import roc_auc_score
def PyAUC(predt: np.ndarray, dtrain: xgb.DMatrix):
y = dtrain.get_label()
return 'PyAUC', roc_auc_score(y, predt)
and pass them by feval
to xgb.cv
, setting param['disable_default_eval_metric'] = 1
and without defining metrics and the result was the same.
Then I tried to use RandomizedSearchCV:
params = {
'min_child_weight': [1, 5, 10],
'gamma': [0.5, 1, 1.5, 2, 5],
'subsample': [0.6, 0.8, 1.0],
'colsample_bytree': [0.6, 0.8, 1.0],
'max_depth': [3, 4, 5]
}
alg = XGBClassifier(learning_rate=0.01, n_estimators=5, objective='binary:logistic',
silent=True, nthread=1)
skf = StratifiedKFold(n_splits=5, shuffle = True, random_state = 1001)
random_search = RandomizedSearchCV(alg, param_distributions=params, n_iter=10, scoring='roc_auc', n_jobs=4, cv=skf.split(X_train, y_train), verbose=3, random_state=1001 )
random_search.fit(X_train, y_train)
print('\n All results:')
print(random_search.cv_results_)
y_pred = random_search.predict(X_train)
print(roc_auc_score(y_train, y_pred))
The output was:
All results:
{'mean_fit_time': array([0.27621794, 0.40631523, 0.36202598, 0.32188687, 0.34574351,
0.2747798 , 0.31780529, 0.32190156, 0.34060073, 0.25945067]), 'std_fit_time': array([0.02603387, 0.04572275, 0.09460844, 0.01841953, 0.08391794,
0.03654419, 0.01583525, 0.03670047, 0.01035465, 0.03085039]), 'mean_score_time': array([0.01927972, 0.0143033 , 0.01697631, 0.01260743, 0.02442002,
0.02089334, 0.0182806 , 0.0132216 , 0.01498265, 0.01320119]), 'std_score_time': array([0.00609847, 0.00671443, 0.00613005, 0.00410744, 0.00384849,
0.00516041, 0.00505873, 0.00276774, 0.00023382, 0.00546102]), 'param_subsample': masked_array(data=[1.0, 0.6, 0.8, 1.0, 0.8, 1.0, 1.0, 0.8, 0.8, 0.8],
mask=[False, False, False, False, False, False, False, False,
False, False],
fill_value='?',
dtype=object), 'param_min_child_weight': masked_array(data=[5, 1, 5, 5, 1, 10, 1, 1, 1, 1],
mask=[False, False, False, False, False, False, False, False,
False, False],
fill_value='?',
dtype=object), 'param_max_depth': masked_array(data=[3, 5, 5, 5, 4, 4, 5, 3, 5, 4],
mask=[False, False, False, False, False, False, False, False,
False, False],
fill_value='?',
dtype=object), 'param_gamma': masked_array(data=[5, 1.5, 1, 5, 1, 1.5, 5, 2, 0.5, 1.5],
mask=[False, False, False, False, False, False, False, False,
False, False],
fill_value='?',
dtype=object), 'param_colsample_bytree': masked_array(data=[1.0, 0.8, 0.8, 0.6, 1.0, 0.6, 0.6, 0.8, 0.6, 0.6],
mask=[False, False, False, False, False, False, False, False,
False, False],
fill_value='?',
dtype=object), 'params': [{'subsample': 1.0, 'min_child_weight': 5, 'max_depth': 3, 'gamma': 5, 'colsample_bytree': 1.0}, {'subsample': 0.6, 'min_child_weight': 1, 'max_depth': 5, 'gamma': 1.5, 'colsample_bytree': 0.8}, {'subsample': 0.8, 'min_child_weight': 5, 'max_depth': 5, 'gamma': 1, 'colsample_bytree': 0.8}, {'subsample': 1.0, 'min_child_weight': 5, 'max_depth': 5, 'gamma': 5, 'colsample_bytree': 0.6}, {'subsample': 0.8, 'min_child_weight': 1, 'max_depth': 4, 'gamma': 1, 'colsample_bytree': 1.0}, {'subsample': 1.0, 'min_child_weight': 10, 'max_depth': 4, 'gamma': 1.5, 'colsample_bytree': 0.6}, {'subsample': 1.0, 'min_child_weight': 1, 'max_depth': 5, 'gamma': 5, 'colsample_bytree': 0.6}, {'subsample': 0.8, 'min_child_weight': 1, 'max_depth': 3, 'gamma': 2, 'colsample_bytree': 0.8}, {'subsample': 0.8, 'min_child_weight': 1, 'max_depth': 5, 'gamma': 0.5, 'colsample_bytree': 0.6}, {'subsample': 0.8, 'min_child_weight': 1, 'max_depth': 4, 'gamma': 1.5, 'colsample_bytree': 0.6}], 'split0_test_score': array([0.75734333, 0.78965043, 0.78929122, 0.77842559, 0.78669592,
0.77856369, 0.7803955 , 0.77733652, 0.78884686, 0.77706318]), 'split1_test_score': array([0.7564997 , 0.78553601, 0.78621578, 0.77250155, 0.78589665,
0.77237991, 0.77235486, 0.77187115, 0.78573708, 0.77046652]), 'split2_test_score': array([0.75575839, 0.77356843, 0.79002323, 0.77134164, 0.76641651,
0.76965581, 0.77133806, 0.76749842, 0.79029943, 0.77043647]), 'split3_test_score': array([0.74596394, 0.77188117, 0.76967513, 0.76816388, 0.76832059,
0.76795065, 0.76942182, 0.76217902, 0.76846871, 0.75720452]), 'split4_test_score': array([0.78099172, 0.80616938, 0.80491224, 0.80371433, 0.81990511,
0.82052725, 0.80327483, 0.80598102, 0.8171982 , 0.8052647 ]), 'mean_test_score': array([0.75931142, 0.78536108, 0.78802352, 0.7788294 , 0.78544696,
0.78181546, 0.77935701, 0.77697323, 0.79011006, 0.77608708]), 'std_test_score': array([0.01159822, 0.0124273 , 0.0112318 , 0.01287854, 0.01920727,
0.01968907, 0.01253142, 0.0153379 , 0.01563886, 0.01595216]), 'rank_test_score': array([10, 4, 2, 7, 3, 5, 6, 8, 1, 9], dtype=int32)}
0.6093407594278569
So still the same problem: during cross-validation score about 0.8 and after that 0.6. I suppose that different metrics are used.
The solution I found was to pass in RandomizedSearchCV: scoring=make_scorer(roc_auc_score)
. This solved the problem giving the same result in cross-validation and after that about 0.6.
Can anyone explain what the problem was because I still don't understand it? And I still don't know how to solve it using optuna optimalization.
You're using model.predict
, but the ROC curve and roc_auc_score
needs the predicted probabilities (or other confidence measures, maybe); use model.predict_proba
.