Search code examples
pythonmachine-learningscikit-learncross-validationscikit-learn-pipeline

return coefficients from Pipeline object in sklearn


I've fit a Pipeline object with RandomizedSearchCV

pipe_sgd = Pipeline([('scl', StandardScaler()),
                    ('clf', SGDClassifier(n_jobs=-1))])

param_dist_sgd = {'clf__loss': ['log'],
                 'clf__penalty': [None, 'l1', 'l2', 'elasticnet'],
                 'clf__alpha': np.linspace(0.15, 0.35),
                 'clf__n_iter': [3, 5, 7]}

sgd_randomized_pipe = RandomizedSearchCV(estimator = pipe_sgd, 
                                         param_distributions=param_dist_sgd, 
                                         cv=3, n_iter=30, n_jobs=-1)

sgd_randomized_pipe.fit(X_train, y_train)

I want to access the coef_ attribute of the best_estimator_ but I'm unable to do that. I've tried accessing coef_ with the code below.

sgd_randomized_pipe.best_estimator_.coef_

However I get the following AttributeError...

AttributeError: 'Pipeline' object has no attribute 'coef_'

The scikit-learn docs say that coef_ is an attribute of SGDClassifier, which is the class of my base_estimator_.

What am I doing wrong?


Solution

  • You can always use the names you assigned to them while making the pipeline by using the named_steps dict.

    scaler = sgd_randomized_pipe.best_estimator_.named_steps['scl']
    classifier = sgd_randomized_pipe.best_estimator_.named_steps['clf']
    

    and then access all the attributes like coef_, intercept_ etc. which are available to corresponding fitted estimator.

    This is the formal attribute exposed by the Pipeline as specified in the documentation:

    named_steps : dict

    Read-only attribute to access any step parameter by user given name. Keys are step names and values are steps parameters.