I'm trying to build a pipeline that contains a pre-processing transformer (it simply removes columns from the data) and an LDA classifier. I wanted to tweak hyperparameters for each, and from looking at other posts and documentation I should just need to use pipelineName__param
, with the double underscore, but that doesn't seem to be working.
pp.pprint(sorted(full_pipeline.get_params().keys()))
>>> [...] #lists all possible params for pipeline which I copied into param_grid
from sklearn.model_selection import GridSearchCV
clf_model = LinearDiscriminantAnalysis()
full_pipeline = Pipeline([
('preprocessing', pp_pipeline),
('model', clf_model),
])
param_grid = {
"preprocessing__dropper__drop_attr": [True, False],
"model__solver": ["svd", "lsqr", "eigen"],
}
search = GridSearchCV(clf_model, param_grid, scoring="f1", return_train_score=True, cv=5, verbose=2, n_jobs=-1)
search.fit(X_train, y_train)
pp_pipeline
is a pipeline that contains the transformer that drops columns, and a standard scaler. I have tested this on the X_train
data alone and it works as expected.
The error the above code block throws up is
ValueError: Invalid parameter 'model' for estimator LinearDiscriminantAnalysis().
Why is it trying to treat model
as a parameter and not the pipeline name, even though I've named it appropriately using a double underscore?
I've tried renaming model
to something else, and even taking the "model__solver"
out of param_grid
entirely - if I do that, I instead get the error
ValueError: Invalid parameter 'preprocessing' for estimator LinearDiscriminantAnalysis().
so I must be missing something key here.
I believe the issue is that you are passing the model, lda
to grid search, not your pipeline. Your code for the GridSearchCV should be:
search = GridSearchCV(full_pipeline, param_grid,
scoring="f1", return_train_score=True, cv=5, verbose=2, n_jobs=-1