Search code examples
pythonscikit-learnpipeline

Getting model attributes from pipeline


I typically get PCA loadings like this:

pca = PCA(n_components=2)
X_t = pca.fit(X).transform(X)
loadings = pca.components_

If I run PCA using a scikit-learn pipeline:

from sklearn.pipeline import Pipeline
pipeline = Pipeline(steps=[    
('scaling',StandardScaler()),
('pca',PCA(n_components=2))
])
X_t=pipeline.fit_transform(X)

is it possible to get the loadings?

Simply trying loadings = pipeline.components_ fails:

AttributeError: 'Pipeline' object has no attribute 'components_'

(Also interested in extracting attributes like coef_ from pipelines.)


Solution

  • Did you look at the documentation: http://scikit-learn.org/dev/modules/pipeline.html I feel it is pretty clear.

    Update: in 0.21 you can use just square brackets:

    pipeline['pca']
    

    or indices

    pipeline[1]
    

    There are two ways to get to the steps in a pipeline, either using indices or using the string names you gave:

    pipeline.named_steps['pca']
    pipeline.steps[1][1]
    

    This will give you the PCA object, on which you can get components. With named_steps you can also use attribute access with a . which allows autocompletion:

    pipeline.names_steps.pca.<tab here gives autocomplete>