Search code examples
machine-learning

Does the learning curve suggest overfitting or an acceptable level of model performance?


enter image description here

Does the learning curve suggest overfitting or an acceptable level of model performance? The results are based on xgboost. Do I need to re-tune the hyperparamters? If so, how to tune the hyperparameters? Currently, I use we used BayesSearchCV from scikit-optimize to automatically tune the hyperparameters. My search space is

from skopt.space import Real, Integer
search_spaces = {'learning_rate': Real(0.0001, 0.04, 'uniform'), 
                 'max_depth': Integer(2, 20),
                 'subsample': Real(0.1, 1.0, 'uniform'),
                 'colsample_bytree': Real(0.1, 1.0, 'uniform'), # subsample ratio of columns by tree
                 'reg_lambda': Real(1e-9, 100., 'uniform'), # L2 regularization, default = 0
                 'reg_alpha': Real(1e-9, 100., 'uniform'), # L1 regularization, default = 0
                 'n_estimators': Integer(100, 3000), # number of boosting rounds or the number of decision trees
                 'min_child_weight': Real(2, 8, 'uniform'),
                 'gamma': Real(0.1, 0.9, 'uniform') 
}

Solution

  • In the picture you can see a General overfitting graph. So, what would you say?

    Usually the train and validation loss seem to converge to the same place until they don't. Once that they start diverging the train loss continue decreasing and the validation loss increases or keeps the same. So, answering your question, yes, I would say you are overfitting from the iteration ~70