Search code examples
cross-validationxgboost

R xgboost xgb.cv pred values: best iteration or final iteration?


I am using the xgb.cv function to grid search best hyperparameters in the R implementation of xgboost. When setting predictions to TRUE, it supplies the predictions for the out of fold observations. Presuming you are using early stopping, do the predictions correspond to predictions at the best iteration or are they the predictions of the final iteration?


Solution

  • CV predictions correspond to the best iteration - you can see this using a 'strict' early_stopping value, then comparing the predictions with those made using models trained with the 'best' number of iterations and 'final' number of iterations, eg:

    # Load minimum reproducible example
    library(xgboost)
    data(agaricus.train, package='xgboost')
    data(agaricus.test, package='xgboost')
    train <- agaricus.train
    dtrain <- xgb.DMatrix(train$data, label=train$label)
    test <- agaricus.test
    dtest <- xgb.DMatrix(test$data, label=test$label)
    
    # Perform cross validation with a 'strict' early_stopping
    cv <- xgb.cv(data = train$data, label = train$label, nfold = 5, max_depth = 2,
                 eta = 1, nthread = 4, nrounds = 10, objective = "binary:logistic",
                 prediction = TRUE, early_stopping_rounds = 1)
    
    # Check which round was the best iteration (the one that initiated the early stopping)
    print(cv$best_iteration)
    [1] 3
    
    # Get the predictions
    head(cv$pred)
    [1] 0.84574515 0.15447612 0.15390711 0.84502697 0.09661318 0.15447612
    
    # Train a model using 3 rounds (corresponds to best iteration)
    trained_model <- xgb.train(data = dtrain, max_depth = 2,
                  eta = 1, nthread = 4, nrounds = 3,
                  watchlist = list(train = dtrain, eval = dtrain),
                  objective = "binary:logistic")
    # Get predictions
    head(predict(trained_model, dtrain))
    [1] 0.84625006 0.15353635 0.15353635 0.84625006 0.09530514 0.15353635
    
    # Train a model using 10 rounds (corresponds to final iteration)
    trained_model <- xgb.train(data = dtrain, max_depth = 2,
                  eta = 1, nthread = 4, nrounds = 10,
                  watchlist = list(train = dtrain, eval = dtrain),
                  objective = "binary:logistic")
    head(predict(trained_model, dtrain))
    [1] 0.9884467125 0.0123147098 0.0050151693 0.9884467125 0.0008781737 0.0123147098
    

    So the predictions from the CV are ~the same as the predictions made when the number of iterations is 'best', not 'final'.