Search code examples
pythonscikit-learnpipeline

Passing parameters to a pipeline's fit() in sklearn


I have a sklearn pipeline with PolynomialFeatures() and LinearRegression() in series. My aim is to fit data to this using different degree of the polynomial features and measure the score. The following is the code I use -

steps = [('polynomials',preprocessing.PolynomialFeatures()),('linreg',linear_model.LinearRegression())]
pipeline = pipeline.Pipeline(steps=steps)

scores = dict()
for i in range(2,6):
    params = {'polynomials__degree': i,'polynomials__include_bias': False}
    #pipeline.set_params(**params)
    pipeline.fit(X_train,y=yCO_logTrain,**params)
    scores[i] = pipeline.score(X_train,yCO_logTrain)

scores

I receive the error - TypeError: fit() got an unexpected keyword argument 'degree'.

Why is this error thrown even though the parameters are named in the format <estimator_name>__<parameter_name>?


Solution

  • As per sklearn.pipeline.Pipeline documentation:

    **fit_paramsdict of string -> object Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.

    This means that the parameters passed this way are directly passed to s step .fit() method. If you check PolynomialFeatures documentation, degree argument is used in construction of the PolynomialFeatures object, not in its .fit() method.

    If you want to try different hyperparameters for estimators/transformators within a pipeline, you could use GridSearchCV as shown here. Here's an example code from the link:

    from sklearn.pipeline import Pipeline
    from sklearn.feature_selection import SelectKBest
    pipe = Pipeline([
        ('select', SelectKBest()),
        ('model', calibrated_forest)])
    param_grid = {
        'select__k': [1, 2],
        'model__base_estimator__max_depth': [2, 4, 6, 8]}
    search = GridSearchCV(pipe, param_grid, cv=5).fit(X, y)