Search code examples
python-2.7machine-learningscikit-learnnaivebayes

Python - SelectFromModel with Naive-Bayes


I am using SelectFromModel in combination with MultinomialNB for feature selection in a text classification task.

SelectFromModel(estimator=MultinomialNB(alpha=1.0))

SelectFromModel determines the importance from features by computing:

importances = np.linalg.norm(estimator.coef_, axis=0,ord=norm_order)

But isn't this exactly the opposite of what I want, because features with a high frequency will result in low absolute values?

There are already multiple well answered questions for determining the importance of features given a specific class, but not for feature importance in general.

Is there a way to determine the feature importance with SelectFromModel in combination with NB or are other approaches better suited for this task?


Solution

  • There is a function known as Recursive Feature Elimination with Cross Validation, also known as RFECV in sklearn. It tries to rank the features according to their importance recursively and performs cross-validation to get the best number of features with the estimator specified. You can look at the example here for more information.

    I am not sure why selectFromModel is not working with NaiveBayes. I will update this answer if I find anything related to it. In the mean time, you can check RFECV suits your needs.