Search code examples
pythonscikit-learnchi-squared

How to get feature names corresponding to scores for chi square feature selection in scikit


I am using Scikit for feature selection, but I want to get the score values for all the unigrams in the text. I get the scores, but I how do I map these to actual feature names.

from sklearn.feature_extraction.text  import CountVectorizer
from sklearn.feature_selection import  SelectKBest, chi2

Texts=["should schools have uniform","schools discipline","legalize marriage","marriage culture"]
labels=["3","3","7","7"]
vectorizer = CountVectorizer()
term_doc=vectorizer.fit_transform(Texts)
ch2 = SelectKBest(chi2, "all")
X_train = ch2.fit_transform(term_doc, labels)
print ch2.scores_

This gives the results, but how do I know which feature names maps to what scores?


Solution

  • It's right there in the documentation:

    get_feature_names()