python machine-learning scikit-learn pipeline attributeerror

Python raises an AttributeError when methods on the sklearn Pipeline object are called

Problem

I am calling the fit_transform() and transform() methods on a Pipeline object, but Python is raising an AttributeError whenever I try to do so. Here is what I'm trying to run, with imports. (Note: train/test splitting has been done already)

from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline

pipe = Pipeline([('mean_impute', SimpleImputer()), 
                 ('norm',        StandardScaler()), 
                 ('sklearn_lm',  LinearRegression())])

pipe.fit_transform(x_train, y_train)  #<-- error here

x_transform = pipe.transform(x_test)  #<-- and here if previous line is absent

The text of the error is as follows:

AttributeError: This 'Pipeline' has no attribute 'fit_transform'

What went wrong? I'm sure it's something simple.

Things I have tried:

Looked over the documentation for sci-kit learn to confirm that these methods exist for the Pipeline object in sklearn
Checked the sizes of x_train and y_train to make sure they were the same, and that they both had headers
Reinstalled sci-kit learn

Solution

Documentation for sklearn.pipeline.Pipeline.fit_transform states that it's "[o]nly valid if the final estimator either implements fit_transform or fit and transform." Wording may be a bit ambiguous, but it means two possibilities: (i) final estimator implements fit_transform, or (ii) final estimator implements fit and transform.

Your final estimator is sklearn.linear_model.LinearRegression, which implements fit, but not transform. This is why the error is raised.