Search code examples
machine-learningscikit-learnclassificationfeature-selectionmultilabel-classification

How to get feature importances of a multi-label classification problem?


I am learning how to use Scikit-Learn and I am trying to get the feature importance of a multi-label classification problem. I defined and trained the model in the following way:

classifier = OneVsRestClassifier(
    make_pipeline(RandomForestClassifier(random_state=42))
)
classifier.classes_ = classes
y_train_pred = cross_val_predict(classifier, X_train_prepared, y_train, cv=3)

The code seems to be working fine until I try to get the feature importance. This is what I tried:

classifier.feature_importances_

But I get the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-98-a9c91f6f2504> in <module>
----> 1 classifier.feature_importances_

AttributeError: 'OneVsRestClassifier' object has no attribute 'feature_importances_'

I tried also the solution proposed in this question but I think it is outdated. Would you be able to propose a newer smart and elegant solution to display the feature importances of a multi-label classification problem?


Solution

  • I would say that the solution within the referenced post is not outdated; instead, you have a slightly different setting to take care of.

    The estimator that you're passing to OneVsRestClassifier is a Pipeline; in the referenced post it was a RandomForestClassifier directly.

    Therefore you'll have to access one of the pipeline's steps to get to the RandomForestClassifier instance on which you'll be finally able to access the feature_importances_ attribute. That's one way of proceeding:

    classifier.estimators_[0].named_steps['randomforestclassifier'].feature_importances_
    

    Eventually, be aware that you'll have to fit your OneVsRestClassifier instance to be able to access its estimators_ attribute. Indeed, though cross_val_predict already takes care of fitting the estimator as you might see here, cross_val_predict does not return the estimator instance, as .fit() method does. Therefore, outside cross_val_predict the fact that classifier was fit is not known, reason why you're not able to access the estimators_ attribute.

    Here is a toy example:

    from sklearn import datasets
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.multiclass import OneVsRestClassifier
    from sklearn.model_selection import train_test_split
    from sklearn.pipeline import make_pipeline
    from sklearn.model_selection import cross_val_predict
    
    iris = datasets.load_iris()
    X = iris.data
    y = iris.target
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=0)
    
    classifier = OneVsRestClassifier(
        make_pipeline(RandomForestClassifier(random_state=42))
    )
    
    classifier.fit(X_train, y_train)
    y_train_pred = cross_val_predict(classifier, X_train, y_train, cv=3) 
    classifier.estimators_[0].named_steps['randomforestclassifier'].feature_importances_