Search code examples
pythonscikit-learnscikit-learn-pipeline

What methods of its last estimator does a scikit-learn pipeline have?


I'm trying to understand scikit-learn Pipelines.

According to a Note in the scikit user guide a Pipeline "has all the methods that the last estimator in the pipeline has".

So I wrote my own estimator class with a method called myfun, used an object of this class as the last step in a new Pipeline instance, and called myfun on it:

class MyEstimator:
    def __init__(self):
        pass
    def fit(self, X, y):
        return self    
    def myfun(self):
        return None

from sklearn.pipeline import make_pipeline
pipe = make_pipeline(MyEstimator())
pipe.myfun()

This resulted in the following error message:

    pipe.myfun()
    ^^^^^^^^^^
AttributeError: 'Pipeline' object has no attribute 'myfun'

Evidently contrary to the user guide's claims, a pipeline does not have all the methods that the last estimator in the pipeline has.

So I wonder: what methods (more precisely, method signatures) of its last estimator does a pipeline have?


Solution

  • It has a subset of the list in the "Methods" section of the API docs; the subset is determined by which methods the final estimator has. (It always has some, e.g. those inherited from BaseEstimator.)

    In the source code, you can identify these methods by the decorator @available_if, e.g.

    @available_if(_final_estimator_has("predict"))
    def predict(...):
    

    https://github.com/scikit-learn/scikit-learn/blob/d99b728b3a7952b2111cf5e0cb5d14f92c6f3a80/sklearn/pipeline.py#L483