Search code examples
scikit-learnpipelinelightgbm

Why does sklearn pipeline.set_params() not work?


I have the following pipeline:

from sklearn.pipeline import Pipeline
import lightgbm as lgb


steps_lgb = [('lgb', lgb.LGBMClassifier())]
 
# Create the pipeline: composed of preprocessing steps and estimators
pipe = Pipeline(steps_lgb)

Now I want to set the parameters of the classifier using the following command:

best_params = {'boosting_type': 'dart',
 'colsample_bytree': 0.7332216010898506,
 'feature_fraction': 0.922329814019706,
 'learning_rate': 0.046566283755421566,
 'max_depth': 7,
 'metric': 'auc',
 'min_data_in_leaf': 210,
 'num_leaves': 61,
 'objective': 'binary',
 'reg_lambda': 0.5185517505019249,
 'subsample': 0.5026815575448366}

pipe.set_params(**best_params)

This however raises an error:

ValueError: Invalid parameter boosting_type for estimator Pipeline(steps=[('estimator', LGBMClassifier())]). Check the list of available parameters with `estimator.get_params().keys()`.

boosting_type is definitely a core parameter of the lightgbm framework, if removed however (from best_params) other parameters cause the valueError to be raised.

So, what I want is to set the parameters of the classifier after a pipeline is created.


Solution

  • When using pipelines, you need to prefix the parameters depending on which part of the pipeline they refer to with the name of the respective component (here lgb) followed by a double uncerscore (lgb__); the fact that here your pipeline consists of only a single element does not change this requirement.

    So, your parameters should be like (only the first 2 elements shown):

    best_params = {'lgb__boosting_type': 'dart',
                   'lgb__colsample_bytree': 0.7332216010898506
                  }
    

    You would have discovered this yourself if you had followed the advice clearly offered in your error message:

    Check the list of available parameters with `estimator.get_params().keys()`.
    

    In your case,

    pipe.get_params().keys()
    

    gives

    dict_keys(['memory',
               'steps', 
               'verbose',
               'lgb',
               'lgb__boosting_type',
               'lgb__class_weight',
               'lgb__colsample_bytree',
               'lgb__importance_type', 
               'lgb__learning_rate',
               'lgb__max_depth',
               'lgb__min_child_samples',
               'lgb__min_child_weight',
               'lgb__min_split_gain',
               'lgb__n_estimators',
               'lgb__n_jobs',
               'lgb__num_leaves',
               'lgb__objective',
               'lgb__random_state',
               'lgb__reg_alpha',
               'lgb__reg_lambda',
               'lgb__silent', 
               'lgb__subsample',
               'lgb__subsample_for_bin',
               'lgb__subsample_freq'])