Search code examples
pythonscikit-learngrid-searchlightgbm

GridSearchCV with lightgbm requires fit() method not used?


I am trying to carry out a GridSearchCV using sklearn on an LightGBM estimator but am running into problems when building the search.

My code to build looks as such:

d_train = lgb.Dataset(X_train, label=y_train)
params = {}
params['learning_rate'] = 0.003
params['boosting_type'] = 'gbdt'
params['objective'] = 'binary'
params['metric'] = 'binary_logloss'
params['sub_feature'] = 0.5
params['num_leaves'] = 10
params['min_data'] = 50
params['max_depth'] = 10

clf = lgb.train(params, d_train, 100)

param_grid = {
    'num_leaves': [10, 31, 127],
    'boosting_type': ['gbdt', 'rf'],
    'learning rate': [0.1, 0.001, 0.003]
    }


gsearch = GridSearchCV(estimator=clf, param_grid=param_grid)
lgb_model = gsearch.fit(X=train, y=y)

However I am running into the following error:

TypeError: estimator should be an estimator implementing 'fit' method, 
          <lightgbm.basic.Booster object at 0x0000014C55CA2880> was passed

LightGBM however is trained using the train() method and not fit() therefore is this grid search not useable?

Thanks


Solution

  • The lgb object you are using does not support the scikit-learn API. This is why you cannot use it in such way.

    However, the lightgbm package offers classes that are compliant with the scikit-learn API. Depending on which supervised learning task you are trying to accomplish, classification or regression, use either LGBMClassifier or LGBMRegressor. An example for a classification task:

    from lightgbm import LGBMClassifier
    from sklearn.model_selection import GridSearchCV
    
    
    clf = LGBMClassifier()
    param_grid = {
        'num_leaves': [10, 31, 127],
        'boosting_type': ['gbdt', 'rf'],
        'learning rate': [0.1, 0.001, 0.003]
    }
    
    gsearch = GridSearchCV(estimator=clf, param_grid=param_grid)
    gsearch.fit(X_train, y_train)