machine-learning modeling cross-validation xgboost

Suboptimal Early Stopping prevents overfitting in Machine Learning?

I have been using the early stopping feature of xgboost for variety of problem statements, mostly classification. But I have the following observation when working on couple of datasets of different domains

At point of minimum evaluation error, but where the difference between train and test (used for evaluation to stop training rounds) errors is relatively high, the model seems to behave as if there has been over-fitting.
In such situations when I consider stopping training rounds at point at which both train and test (evaluation data during training) errors are reasonably similar (though evaluation error is not at minimum), the models perform better and as per the error terms estimation.

Therefore the question is: should the number of training rounds be stopped much earlier than at the optimal point (where there is a very high divergence error between train and test (eval), though validation error is lower)?

Please assume that every care has been taken to correctly split the datasets for train, test, validation, etc.

Thanks.

Solution

Early stopping in xgboost works as follows:

It looks over the last tuple of your "watchlist" (usually you put the validation/testing set) there
It evaluates this set by your evaluation metric
If this evaluation hasn't changed for x times (where x = early_stopping_rounds)
The model stops train, and know where was the best iteration (with the best evaluation of your test/validation set)

Yes, your model will be built with x unnecessary iterations (boosters). But assuming you have a trained xgboost.Booster in clf

# Will give you the best iteration
best_iteration = clf.best_ntree_limit

# Will predict only using the boosters untill the best iteration
y_pred = clf.predict(dtest, ntree_limit=best_iteration)

Which concludes a no, to your question.