I am reading this document to get a handle on naive bayes. It should link to page 35
https://web.stanford.edu/class/cs124/lec/naivebayes.pdf#page=35
given two documents "a b b" and "c d d" would vocabulary be:
{a,b,b,c,d,d}, |Vocabulary| == 6
or: {a,b,c,d}, |Vocabulary| == 4
just need a sanity check, thanks
The vocabulary would be {a,b,c,d}
. Vocabulary refers to the set of terms the classifier is working with, sometimes called the feature space of the classifier.