I'm using lightgbm with sklearn stacking
method, but I encounter a problem which is :
How can I setting some parameters in LGBMRegressor.fit
function?
This is my code for now :
from sklearn.datasets import load_diabetes
from sklearn.linear_model import RidgeCV
from sklearn.svm import LinearSVR
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import StackingRegressor
from lightgbm import LGBMRegressor
X, y = load_diabetes(return_X_y=True)
estimators = [
('lr', RidgeCV()),
('svr', LinearSVR(random_state=42)),
('lgb', LGBMRegressor())
]
reg = StackingRegressor(
estimators=estimators,
final_estimator=RandomForestRegressor(n_estimators=10,
random_state=42)
)
reg.fit(X,Y)
But I want to set num_boost_round
and early_stopping_rounds
in LGBMRegressor.fit
, how can I achieve that when I used with StackingRegressor.fit
※Note : Without using stacking method, I can implement with
lgb = LGBMRegressor()
lgb.fit(X,Y, num_boost_round=20000, early_stopping_rounds=1000)
I think the issue is not that you cannot specify num_boost_round
and early_stopping_round
in the fit. Those parameters are not officially supported according to the documentation, but if you were using them, you would be putting them in the instantiation call.
lgb = LGBMRegressor(num_boost_round=20000, early_stopping_rounds=1000)
I think the problem is that if you are trying to use early_stopping, you have to put evaluation sets into the fit()
call, which is definitely not supported (at least not in the current version).
You can still get what you want, you just have to wrap your model into a class that supports the API, essentially moving those parameters to the object instantiation:
import lightgbm as ltb
class MyWrappedLGBR:
def __init__(self, fit_parameters: dict):
self.fit_parameters = fit_parameters
def fit(self, X, y):
my_data_set = ltb.Dataset(data = X, label=y)
ltb.train(params=self.fit_parameters, train_set=my_data_set)
def predict(self, X):
return self.model.predict(X)
And create your estimator as:
my_params = {
'num_boost_round': 20000,
'early_stopping_rounds': 1000,
'valid_sets': your_validation_set
}
my_lgb = MyWrappedLGBR(my_params)
Then, when StackingRegressor
makes calls to fit
and predict
, it will behave the way you want.
If you really want to stick to the sklearn API and are willing to take the risk that you will get unexpected behavior, you can create a wrapper class more in the vein of that API as well:
class MySKLWrappedLGBR:
def __init__(self, my_model, fit_parameters: dict):
self.model = my_model
self.fit_parameters = fit_parameters
def fit(self, X, y):
self.model.fit(X, y, **self.fit_parameters)
def predict(self, X):
return self.model.predict(X)
Then something like this *might work:
lgb = LGBMRegressor(num_boost_round=20000, early_stopping_rounds=1000)
my_eval_params = {
'valid_sets': your_validation_set
}
my_wrapped_lgb = MySKLWrappedLGBR(lgb, my_eval_params)
But again, none of this functionality is officially supported in the Sklearn API, so it is better to use the earlier wrapper class that uses the Dataset API.