I'm trying to understand scikit-learn
Pipelines.
According to a Note in the scikit user guide a Pipeline "has all the methods that the last estimator in the pipeline has".
So I wrote my own estimator class with a method called myfun
, used an object of this class as the last step in a new Pipeline instance, and called myfun
on it:
class MyEstimator:
def __init__(self):
pass
def fit(self, X, y):
return self
def myfun(self):
return None
from sklearn.pipeline import make_pipeline
pipe = make_pipeline(MyEstimator())
pipe.myfun()
This resulted in the following error message:
pipe.myfun()
^^^^^^^^^^
AttributeError: 'Pipeline' object has no attribute 'myfun'
Evidently contrary to the user guide's claims, a pipeline does not have all the methods that the last estimator in the pipeline has.
So I wonder: what methods (more precisely, method signatures) of its last estimator does a pipeline have?
It has a subset of the list in the "Methods" section of the API docs; the subset is determined by which methods the final estimator has. (It always has some, e.g. those inherited from BaseEstimator
.)
In the source code, you can identify these methods by the decorator @available_if
, e.g.
@available_if(_final_estimator_has("predict"))
def predict(...):