Search code examples
machine-learninglightgbm

lightgbm.cv: cvbooster.best_iteration always returns -1


I am migrating from XGBoost to LightGBM (since I need it's exact handling of interaction constraints) and I am struggling to understand the result of LightGBM CV. In the example below, the minimum log-loss is achieved on iteration 125, but model['cvbooster'].best_iteration returns -1. I would have expected it to return 125 as well - or am I misunderstanding something here? Is there a better way to get the best iteration, or does one just need to manually check?

I have seen this discussion but even when I check the boosters in cvbooster (e.g., model['cvbooster'].boosters[0].best_iteration), they all return -1 as well...

import lightgbm as lgb
import numpy as np
from sklearn import datasets

X, y = datasets.make_classification(n_samples=10_000, n_features=5, n_informative=3, random_state=9)

data_train_lgb = lgb.Dataset(X, label=y)

param = {'objective':   'binary',
         'metric':      ['binary_logloss'],
         'device_type': 'cuda'}

model = lgb.cv(param,
               data_train_lgb,
               num_boost_round=1_000,
               return_cvbooster=True)

opt_1 = np.argmin(model['valid binary_logloss-mean'])
print(f"index argmin: {opt_1}")
print(f"logloss argmin: {model['valid binary_logloss-mean'][opt_1]}")

opt_2 = model['cvbooster'].best_iteration
print(f"index best_iteration: {opt_2}")
print(f"logloss best_iteration: {model['valid binary_logloss-mean'][opt_2]}")

---

>>> index argmin: 125
>>> logloss argmin: 0.13245999867688793

>>> index best_iteration: -1
>>> logloss best_iteration: 0.2661896445658779

Solution

  • In lightgbm (the Python package for LightGBM), best_iteration isn't the iteration where the model achieved the best performance on evaluation metrics... it's the last iteration (1-based) where performance on evaluation metrics improved, if early stopping is used.

    See this example (using lightgbm==4.5.0, scikit-learn==1.6.0, and Python 3.11).

    import lightgbm as lgb
    import numpy as np
    from sklearn import datasets
    
    X, y = datasets.make_classification(
        n_samples=10_000,
        n_features=5,
        n_informative=3,
        random_state=9
    )
    
    params = {
        "deterministic": True,
        "objective": "binary",
        "metric": "binary_logloss",
        "seed": 708
    }
    
    # train without early stopping
    model = lgb.cv(
        params=params,
        train_set=lgb.Dataset(X, label=y),
        num_boost_round=200,
        return_cvbooster=True
    )
    
    model['cvbooster'].best_iteration
    # -1
    
    opt_1 = np.argmin(model['valid binary_logloss-mean'])
    print(f"index argmin: {opt_1}")
    # index argmin: 114
    print(f"logloss argmin: {model['valid binary_logloss-mean'][opt_1]:.6f}")
    logloss argmin: 0.132579
    
    # train WITH early stopping
    model = lgb.cv(
        params={**params, "early_stopping_rounds": 5},
        train_set=lgb.Dataset(X, label=y),
        num_boost_round=200,
        return_cvbooster=True
    )
    
    model['cvbooster'].best_iteration
    # 115
    
    opt_1 = np.argmin(model['valid binary_logloss-mean'])
    print(f"index argmin: {opt_1}")
    # index argmin: 114
    print(f"logloss argmin: {model['valid binary_logloss-mean'][opt_1]:.6f}")
    # logloss argmin: 0.132579
    

    Notes on that:

    • adding "deterministic": True and setting "seed" to a positive value helps make training deterministic
    • early stopping in cv() can be enabled by passing a positive value for "early_stopping_rounds" through params