Search code examples
pythonmachine-learningxgboostlightgbm

Calibrating Probabilities in lightgbm or XGBoost


I need help in calibrating probabilities in lightgbm

below is my code

cv_results = lgb.cv(params, 
                    lgtrain, 
                    nfold=10,
                    stratified=False ,
                    num_boost_round = num_rounds,
                    verbose_eval=10,
                    early_stopping_rounds = 50, 
                    seed = 50)

best_nrounds = cv_results.shape[0] - 1

lgb_clf = lgb.train(params, 
                    lgtrain, 
                    num_boost_round=10000 ,
                    valid_sets=[lgtrain,lgvalid],
                    early_stopping_rounds=50,
                    verbose_eval=10)

ypred = lgb_clf.predict(test, num_iteration=lgb_clf.best_iteration)

Solution

  • I am not sure about LighGBM, but in the case of XGBoost, if you want to calibrate the probabilities the best and most probably the only way is to use CalibratedClassifierCV from sklearn.

    You can find it here - https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html

    The only catch here is that the CalibratedClassifierCV only takes sklearn's estimators as input, so you might have to use the sklearn wrapper for XGBoost instead of the traditional XGBoost API's .train function.

    You can find the XGBoost's sklearn wrapper here - https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBClassifier