scikit-learn feature-extraction feature-selection naivebayes countvectorizer

Can I add and remove features manually from CountVectorizer?

I'm doing text classificaiton, and using naive bayes with CountVectorizer. I'm looking for away to add and remove features manually. maybe I can remove features through stop_words(is that the best way?) but I couldn't find a way to add features. if I used 'vocabulary' parameter, then there will be no feature extracted from the text other than the ones present in the vocabulary. and that's a problem

Solution

Yes, removing features using stop_words is the best possible way to keep the results consistent. You can also do a traversal and remove data manually but that will be same as removing them using stop_words. To add elements to the stop_word in sklearn, do this.

from sklearn.feature_extraction import text 
stop_words = text.ENGLISH_STOP_WORDS.union(additional_stop_words)

Math.Sin() gives incorrect value
How to run my python script when the sunOS is start booting
Express-session: not resetting cookie expiration on each request
Getting a stack overflow exception when normalizing a vector
Edit default summary function in R gives error for multiple variables
What was a For loop? Why isn't it needed in R?
How to use download button in shiny and save results in various formats (csv, texte, pdf, spss...)?
Why are there two assignment operators, `<-` and `->` in R?
lm()$assign: what is it?
How to get the value of list(...) in R and S functions
Design matrix for MLM from library(lme4) with fixed and random effects
how to generate elements not included in my sample
Create a matrix with gradually changing values without a for loop
Emacs ESS and S-plus ( S+ ) 8.1 compatability
How to lag date-index in a time-series in R?
Nonlinear regression in R / S
Calling R from S-Plus?