Search code examples
scikit-learnfeature-extractionfeature-selectionnaivebayescountvectorizer

Can I add and remove features manually from CountVectorizer?


I'm doing text classificaiton, and using naive bayes with CountVectorizer. I'm looking for away to add and remove features manually. maybe I can remove features through stop_words(is that the best way?) but I couldn't find a way to add features. if I used 'vocabulary' parameter, then there will be no feature extracted from the text other than the ones present in the vocabulary. and that's a problem


Solution

  • Yes, removing features using stop_words is the best possible way to keep the results consistent. You can also do a traversal and remove data manually but that will be same as removing them using stop_words. To add elements to the stop_word in sklearn, do this.

    from sklearn.feature_extraction import text 
    stop_words = text.ENGLISH_STOP_WORDS.union(additional_stop_words)