I am using MALLET for text classification (with Naive Bayes) and I understand there is this FeatureSequence2FeatureVector() method for creating feature vectors that can be used as part of the Pipe. My question is which weighting schema is implemented when we use FeatureSequence2FeatureVector() with no arguments and FeatureSequence2FeatureVector(boolean x). With the second one, x=TRUE should result in Bernoulli Naive Bayes, I suppose. But what about the no argument and also x=FALSE versions?
By default the FeatureSequence2FeatureVector
will set feature values to raw feature counts. For example, the string "dog cat dog" will map to
{ "dog": 2.0, "cat": 1.0 }
Passing true
as an argument will result in
{ "dog" 1.0, "cat": 1.0 }