Search code examples
python-3.xscikit-learnrandom-forestdecision-tree

Feature importance calculation for every features to every tree in Random Forest


I use python library sklearn.ensemble.RandomForestClassifier. I want to know the feature importance for every feature to all tree. Suppose, I have P features and M trees. I want to calculate PxM matrix where every feature's feature importance is calculated to every tree. Here is the source code of sklearn for Random Forest feature importances. In this method, I think all_importances variable is PxM matrix. But How can I access that variable?

Thanks in advance.


Solution

  • You can get access to the individual trees using .estimators_ and then call the feature_importances_

    Here is an example:

    from sklearn.ensemble import RandomForestClassifier
    from sklearn.datasets import make_classification
    
    X, y = make_classification(n_samples=1000, n_features=4,
                               n_informative=2, n_redundant=0,
                               random_state=0, shuffle=False)
    clf = RandomForestClassifier(n_estimators=5, max_depth=2,
                                 random_state=0)
    clf.fit(X, y)
    
    feature_imp_ = [tree.feature_importances_.T for tree in clf.estimators_]
    

    output:

    [array([0.02057642, 0.96636638, 0.        , 0.01305721]),
     array([0.86128406, 0.        , 0.13871594, 0.        ]),
     array([0.00471007, 0.98648234, 0.        , 0.00880759]),
     array([0.02730208, 0.97269792, 0.        , 0.        ]),
     array([0.65919044, 0.34080956, 0.        , 0.        ])]