Search code examples
python-3.xmachine-learningscikit-learnlightgbm

LightGBM fit parameter when used in sklearn stacking


I'm using lightgbm with sklearn stacking method, but I encounter a problem which is :

How can I setting some parameters in LGBMRegressor.fit function?

This is my code for now :

from sklearn.datasets import load_diabetes
from sklearn.linear_model import RidgeCV
from sklearn.svm import LinearSVR
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import StackingRegressor
from lightgbm import LGBMRegressor

X, y = load_diabetes(return_X_y=True)
estimators = [
    ('lr', RidgeCV()),
    ('svr', LinearSVR(random_state=42)),
    ('lgb', LGBMRegressor())
]
reg = StackingRegressor(
    estimators=estimators,
    final_estimator=RandomForestRegressor(n_estimators=10,
                                          random_state=42)
)
reg.fit(X,Y)

But I want to set num_boost_round and early_stopping_rounds in LGBMRegressor.fit, how can I achieve that when I used with StackingRegressor.fit

※Note : Without using stacking method, I can implement with

lgb = LGBMRegressor()
lgb.fit(X,Y, num_boost_round=20000, early_stopping_rounds=1000)

Solution

  • I think the issue is not that you cannot specify num_boost_round and early_stopping_round in the fit. Those parameters are not officially supported according to the documentation, but if you were using them, you would be putting them in the instantiation call.

    lgb = LGBMRegressor(num_boost_round=20000, early_stopping_rounds=1000)
    

    I think the problem is that if you are trying to use early_stopping, you have to put evaluation sets into the fit() call, which is definitely not supported (at least not in the current version).

    You can still get what you want, you just have to wrap your model into a class that supports the API, essentially moving those parameters to the object instantiation:

    import lightgbm as ltb
    class MyWrappedLGBR:
        def __init__(self, fit_parameters: dict):
            self.fit_parameters = fit_parameters
    
        def fit(self, X, y):
            my_data_set = ltb.Dataset(data = X, label=y)
            ltb.train(params=self.fit_parameters, train_set=my_data_set)
    
        def predict(self, X):
            return self.model.predict(X)
    

    And create your estimator as:

    my_params = {
        'num_boost_round': 20000,
        'early_stopping_rounds': 1000,
        'valid_sets': your_validation_set
    }
    my_lgb = MyWrappedLGBR(my_params)
    

    Then, when StackingRegressor makes calls to fit and predict, it will behave the way you want.

    If you really want to stick to the sklearn API and are willing to take the risk that you will get unexpected behavior, you can create a wrapper class more in the vein of that API as well:

    class MySKLWrappedLGBR:
        def __init__(self, my_model, fit_parameters: dict):
            self.model = my_model
            self.fit_parameters = fit_parameters
    
        def fit(self, X, y):
            self.model.fit(X, y, **self.fit_parameters)
    
        def predict(self, X):
            return self.model.predict(X)
    

    Then something like this *might work:

    lgb = LGBMRegressor(num_boost_round=20000, early_stopping_rounds=1000)
    
    my_eval_params = {
        'valid_sets': your_validation_set
    }
    
    my_wrapped_lgb = MySKLWrappedLGBR(lgb, my_eval_params)
    

    But again, none of this functionality is officially supported in the Sklearn API, so it is better to use the earlier wrapper class that uses the Dataset API.