Search code examples
xgboost

XGBoost- Help interpreting the booster behaviour. Why is the 0th iteration always coming out to be best?


I am training an XGBoost model and having trouble interpreting the model behaviour.

  • early_stopping_rounds =10
  • num_boost_round=100
  • Dataset is unbalanced with 458644 1s and 7975373 0s
  • evaluation metric is AUCPR
  • param = {'max_depth':6, 'eta':0.03, 'silent':1, 'colsample_bytree': 0.3,'objective':'binary:logistic', 'nthread':6, 'subsample':1, 'eval_metric':['aucpr']}

From my understanding of "early_stopping_rounds" the training is supposed to stop after no improvement is observed in the test/evaluation dataset's eval metric(aucpr) for 10 consecutive rounds. However, in my case, even when there is a clear improvement in the AUCPR of the evaluation dataset, the training still stops after the 10th boosting stage. Please see the training log below. Additionally, the best iteration comes out to be the 0th one when clearly the 10th iteration has an AUCPR much higher than the 0th iteration.

enter image description here Is this right? If not what could be going wrong? If yes then please correct my understanding about early stopping rounds and best iteration.


Solution

  • Very interesting!!

    So it turns out that early_stopping looks to minimize (RMSE, log loss, etc.) and to maximize (MAP, NDCG, AUC) - https://xgboost.readthedocs.io/en/latest/python/python_intro.html

    When you use aucpr, it is actually trying to minimize it - perhaps that's the default behavior.

    Try to set maximize=True when calling xgboost.train() - https://github.com/dmlc/xgboost/issues/3712