Search code examples
scikit-learnrandom-forestone-hot-encodinggrid-search

Invalid parameter imputer getting Check the list of available parameters with `estimator.get_params().keys()`


When I try to run a RandomForestClassifier with Pipeline and param_grid:

nominal_columns = ['heating', 'fuel', 'sewer', 'waterfront', 'newConstruction', 'centralAir']
  
numerical_pipeline = Pipeline([('imputer', SimpleImputer(strategy='mean')),
                               ('scaler', StandardScaler())])
nominal_pipeline = Pipeline([('imputer', SimpleImputer(strategy='most_frequent')),
                             ('encoder', OneHotEncoder(handle_unknown='ignore'))])

preprocessor = ColumnTransformer([
    ('numerical_transformer', numerical_pipeline, numerical_columns),
    ('nominal_transformer', nominal_pipeline, nominal_columns),
])

pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('regressor', RandomForestRegressor(random_state=0))
    ])
  
model = pipeline.fit(X_train, y_train)

param_grid = [
    {'imputer__strategy': ['mean', 'median'],
     'regressor__n_estimators': [3, 10, 30],
     'regressor__max_features': [2, 4, 6]},

    {'imputer__strategy': ['mean', 'median'],
     'regressor__bootstrap': [False],
     'regressor__n_estimators': [3, 10],
     'regressor__max_features': [2, 3, 4]},
     ]
gridSearch = GridSearchCV(model, param_grid, cv=3,
                           scoring='neg_mean_squared_error',
                           return_train_score=True)

I get this error

ValueError: Invalid parameter imputer for estimator Pipeline(steps=[('preprocessor',
                 ColumnTransformer(transformers=[('numerical_transformer',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer()),
                                                                  ('scaler',
                                                                   StandardScaler())]),
                                                  ['lotSize', 'age',
                                                   'landValue', 'livingArea',
                                                   'pctCollege', 'bedrooms',
                                                   'fireplaces', 'bathrooms',
                                                   'rooms']),
                                                 ('nominal_transformer',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(strategy='most_frequent')),
                                                                  ('encoder',
                                                                   OneHotEncoder(handle_unknown='ignore'))]),
                                                  ['heating', 'fuel', 'sewer',
                                                   'waterfront',
                                                   'newConstruction',
                                                   'centralAir'])])),
                ('regressor', RandomForestRegressor(random_state=0))]). Check the list of available parameters with `estimator.get_params().keys()`.

I've been reading documentation for the past hour and still haven't managed to find a solution to this. Is there a problem with my preprocessor? I've tried to change my strategy to mean instead of most_frequent but that means I get a cannot convert stirng to float error


Solution

  • You've misspecified one of the hyperparameters, imputer__strategy. Your model is a pipeline containing a column transformer containing pipelines, so you need a name for each of those. I believe you need

    preprocessor__numerical_transformer__imputer__strategy