Search code examples
pythongrid-searchlightgbm

Grid search with LightGBM regression


I want to train a regression model using Light GBM, and the following code works fine:

import lightgbm as lgb

d_train = lgb.Dataset(X_train, label=y_train)
params = {}
params['learning_rate'] = 0.1
params['boosting_type'] = 'gbdt'
params['objective'] = 'gamma'
params['metric'] = 'l1'
params['sub_feature'] = 0.5
params['num_leaves'] = 40
params['min_data'] = 50
params['max_depth'] = 30

lgb_model = lgb.train(params, d_train, 1000)

#Prediction
y_pred=lgb_model.predict(X_test)
mae_error = mean_absolute_error(y_test,y_pred)

print(mae_error)

But when I proceed to using GridSearchCV, I encounter problems. I am not completely sure how to set this up correctly. I found useful sources, for example here, but they seem to be working with a classifier.

1st try:

from sklearn.metrics import make_scorer
score_func = make_scorer(mean_absolute_error, greater_is_better=False)

model = lgb.LGBMClassifier( 
    boosting_type="gbdt",
    objective='regression',
    is_unbalance=True, 
    random_state=10, 
    n_estimators=50,
    num_leaves=30, 
    max_depth=8,
    feature_fraction=0.5,  
    bagging_fraction=0.8, 
    bagging_freq=15, 
    learning_rate=0.01,    
)

params_opt = {'n_estimators':range(200, 600, 80), 'num_leaves':range(20,60,10)}
gridSearchCV = GridSearchCV(estimator = model, 
    param_grid = params_opt, 
    scoring=score_func)
gridSearchCV.fit(X_train,y_train)
gridSearchCV.grid_scores_, gridSearchCV.best_params_, gridSearchCV.best_score_

, gives me a bunch of error before:

"ValueError: Unknown label type: 'continuous'"

UPDATE: I made the code run switching LGBMClassifier with LGBMModel. Should I try to use LGBMRegressor too, or does this not matter? (source: https://lightgbm.readthedocs.io/en/latest/_modules/lightgbm/sklearn.html)


Solution

  • First of all, it is unclear what is the nature of you data and thus what type of model fits better. You use L1 metric, so i assume you have some sort of regression problem. If not, please correct me and elaborate why do you use L1 metric then. If yes, then it is unclear why do you use LGBMClassifier at all, since it serves classification problems (as @bakka has already pointed out).

    Note, that in practise LGBMModel is the same as LGBMRegressor (you can see it in the code). However, there is no guarantee that this will remain so in the long-term future. So if you want to write good and maintainable code - do not use the base class LGBMModel, unless you know very well what you are doing, why and what are the consequences.

    Regarding the parameter ranges: see this answer on github