I have a dataset where most of the columns have text values. So I used tfidf and count vectorizers for converting this dataset into vector form. As, a result I got a sparse matrix. I applied Decision tree algorithm and I got the expected results. Now, I want to prepare another model where I use only those features that have non-zero feature importance. But, am not able to filter those features that have non-zero importance.
X_tr
<65548x3101 sparse matrix of type '<class 'numpy.float64'>'
with 7713590 stored elements in Compressed Sparse Row format>
Here, X_tr is my training dataset.
X_tr.shape
(65548, 3101)
dtc.feature_importances_.shape
(3101,)
Here, 'dtc' is my decision tree classifier model.
My question is, how can I get another sparse matrix which contains only those feature where feature importance is a non-zero value ?
I think this should be as simple as:
X_tr[:, dtc.feature_importances_ != 0]