How to understand nfold and nrounds in R's package xgboost

I am trying to use R's package xgboost. But there is something I feel confused. In xgboost manual, under xgb.cv function, it says:

The original sample is randomly partitioned into nfold equal size subsamples.

Of the nfold subsamples, a single subsample is retained as the validation data for testing the model, and the remaining nfold - 1 subsamples are used as training data.

The cross-validation process is then repeated nrounds times, with each of the nfold subsamples used exactly once as the validation data.

And this is the code in the manual:

data(agaricus.train, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
cv <- xgb.cv(data = dtrain, nrounds = 3, nthread = 2, nfold = 5, metrics = 
list("rmse","auc"),
max_depth = 3, eta = 1, objective = "binary:logistic")
print(cv)
print(cv, verbose=TRUE)

And the result is:

##### xgb.cv 5-folds
call:
  xgb.cv(data = dtrain, nrounds = 3, nfold = 5, metrics = list("rmse", 
    "auc"), nthread = 2, max_depth = 3, eta = 1, objective = "binary:logistic")
params (as set within xgb.cv):
  nthread = "2", max_depth = "3", eta = "1", objective = "binary:logistic", 
eval_metric = "rmse", eval_metric = "auc", silent = "1"
callbacks:
  cb.print.evaluation(period = print_every_n, showsd = showsd)
  cb.evaluation.log()
niter: 3
evaluation_log:
 iter train_rmse_mean train_rmse_std train_auc_mean train_auc_std test_rmse_mean test_rmse_std test_auc_mean test_auc_std
1       0.1623756    0.002693092      0.9871108  1.123550e-03      0.1625222   0.009134128     0.9870954 0.0045008818
2       0.0784902    0.002413883      0.9998370  1.317346e-04      0.0791366   0.004566554     0.9997756 0.0003538184
3       0.0464588    0.005172930      0.9998942  7.315846e-05      0.0478028   0.007763252     0.9998902 0.0001347032

Let's say nfold=5 and nrounds=2. It means the data is splited into 5 parts with equal size. And the algorithm will generate 2 trees.

my understand is: each subsample has to be the validation once. When one subsample is validation, 2 trees will be generated. So, we will have 5 sets of trees (one set has 2 trees because nrounds=2). Then we check if the evaluation metric varies a lot or not.

But the result does not say the same way. one nround value has one line of the evaluation metric, which looks like it already includes the 'cross validation' part. So, if 'The cross-validation process is then repeated nrounds times', then how come 'with each of the nfold subsamples used exactly once as the validation data'?

Solution

Those are the means and standard deviations of the scores of the nfold fit-test procedures run at every round in nrounds. The XGBoost cross validation process proceeds like this:

The dataset X is split into nfold subsamples, X₁, X₂...X_nfold.
The XGBoost algorithm fits a boosted tree to a training dataset comprising X₁, X₂,...,X_nfold-1, while the last subsample (fold) X_nfold is held back as a validation¹ (out-of-sample) dataset. The chosen evaluation metrics (RMSE, AUC, etc.) are calculated for both the training and validation dataset and retained.
One subsample (fold) in the training dataset is now swapped with the validation subsample (fold), so the training dataset now comprises X₁, X₂, ... , X_nfold-2, X_nfold and the validation (out-of-sample) dataset is X_nfold-1. Once again, the algorithm fits a boosted tree to the training data, calculates the evaluation scores (for each chosen metric) and so on.
This process repeats nfold times until every subsample (fold) has served both as a part of the training set and as a validation set.
Now, another boosted tree is added and the process outlined in steps 2-4 is repeated. This continues until the total number of boosted trees being fitted to the training data is equal to nrounds.
There are now nfold calculated evaluation scores (times the number of distinct metrics chosen) for each round in nrounds for both the training sets and the validation sets (scores on the validation sets naturally tend to be worse). The means and standard deviations of the nfold scores is calculated for both the training and validation sets (times the number of distinct metrics chosen) for each round in nrounds and returned in a dataframe with nrounds rows.

¹ Note that what I would call the 'validation' set is identified by XGBoost as the 'test' set in the evaluation log