I am still new in machine learning. i am trying to train my model using 5-fold CV from lgb.cv(), but I am not sure how to use the results in lgb.train(). i.e how can i use the 'cv results' in my 'lgb_clf'?i cant understand differnce between cv() and train()
lgbm_params = {
'objective': 'binary',
'metric': 'auc',
'is_unbalance': 'true',
'boosting': 'gbdt',
'num_leaves': 31,
'feature_fraction': 0.5,
'bagging_fraction': 0.5,
'bagging_freq': 20,
'learning_rate': 0.05,
'verbose': 0
}
metric = 'auc'
cv_folds = 5
num_rounds = 5000
lgtrain = lgb.Dataset(train, label=label)
lgvalid = lgb.Dataset(test,label=label)
cv = lgb.cv(lgbm_params, lgtrain, num_rounds, nfold=cv_folds, metrics={metric}, early_stopping_rounds=100)
lgb_clf = lgb.train(lgbm_params, lgtrain,num_rounds, early_stopping_rounds=100, valid_sets=[lgtrain,lgvalid])
The question is what you want to use the cross-validation for. If you are estimating the generalization error for a predefined set of hyperparameters, you can take the output of lgb.cv
directly, and there is no need to train the model again. If, on the other hand, you are searching for optimal values of hyperparameters, you would want to probe multiple points in this space, compute the cross-validation score for each, and choose the point with the best score. You would then retrain the model using thus found hyperparameters.