Search code examples
scikit-learnlightgbm

lightbgm sklearn n_estimators and n_estimators_


I set n_estimators to 50 in lightgbm sklearn interface. When fitting stopped, n_estimators_ is 100. Why is this the case?

regressor = lightgbm.LGBMRegressor(n_estimators=50)

n_estimators (int, optional (default=100)) – Number of boosted trees to fit.

n_estimators_: True number of boosting iterations performed.

Why is n_estimators_ double that of n_estimators_? Aren't they supposed to be the same?


Solution

  • With lightgbm==4.3.0 on macOS (Python 3.11.7, scikit-learn==1.4.1), I'm not able to reproduce that.

    import lightgbm
    from sklearn.datasets import make_regression
    
    X, y = make_regression(n_samples=10_000)
    regressor = lightgbm.LGBMRegressor(n_estimators=50)
    regressor.fit(X, y)
    
    regressor.n_estimators
    # 50
    
    regressor.n_estimators_
    # 50
    

    I suspect that maybe you called .fit() twice, passing the already-fitted model in as init_model the second time, which resulted in LightGBM performing an additional 50 rounds of boosting.

    # fit again
    regressor.fit(X, y, init_model=regressor)
    
    regressor.n_estimators
    # 50
    
    regressor.n_estimators_
    # 100
    

    Those properties have slightly different meanings:

    • .n_estimators = maximum number of boosting rounds to perform next time fit() is called
    • .n_estimators_ = total number of trees in the model