Why does sklearn pipeline.set_params() not work?

I have the following pipeline:

from sklearn.pipeline import Pipeline
import lightgbm as lgb


steps_lgb = [('lgb', lgb.LGBMClassifier())]
 
# Create the pipeline: composed of preprocessing steps and estimators
pipe = Pipeline(steps_lgb)

Now I want to set the parameters of the classifier using the following command:

best_params = {'boosting_type': 'dart',
 'colsample_bytree': 0.7332216010898506,
 'feature_fraction': 0.922329814019706,
 'learning_rate': 0.046566283755421566,
 'max_depth': 7,
 'metric': 'auc',
 'min_data_in_leaf': 210,
 'num_leaves': 61,
 'objective': 'binary',
 'reg_lambda': 0.5185517505019249,
 'subsample': 0.5026815575448366}

pipe.set_params(**best_params)

This however raises an error:

ValueError: Invalid parameter boosting_type for estimator Pipeline(steps=[('estimator', LGBMClassifier())]). Check the list of available parameters with `estimator.get_params().keys()`.

boosting_type is definitely a core parameter of the lightgbm framework, if removed however (from best_params) other parameters cause the valueError to be raised.

So, what I want is to set the parameters of the classifier after a pipeline is created.

Solution

When using pipelines, you need to prefix the parameters depending on which part of the pipeline they refer to with the name of the respective component (here lgb) followed by a double uncerscore (lgb__); the fact that here your pipeline consists of only a single element does not change this requirement.

So, your parameters should be like (only the first 2 elements shown):

best_params = {'lgb__boosting_type': 'dart',
               'lgb__colsample_bytree': 0.7332216010898506
              }

You would have discovered this yourself if you had followed the advice clearly offered in your error message:

Check the list of available parameters with `estimator.get_params().keys()`.

In your case,

pipe.get_params().keys()

gives

dict_keys(['memory',
           'steps', 
           'verbose',
           'lgb',
           'lgb__boosting_type',
           'lgb__class_weight',
           'lgb__colsample_bytree',
           'lgb__importance_type', 
           'lgb__learning_rate',
           'lgb__max_depth',
           'lgb__min_child_samples',
           'lgb__min_child_weight',
           'lgb__min_split_gain',
           'lgb__n_estimators',
           'lgb__n_jobs',
           'lgb__num_leaves',
           'lgb__objective',
           'lgb__random_state',
           'lgb__reg_alpha',
           'lgb__reg_lambda',
           'lgb__silent', 
           'lgb__subsample',
           'lgb__subsample_for_bin',
           'lgb__subsample_freq'])