Search code examples
nlpnaivebayesdocument-classification

Naive Bayes vocabulary a set?


I am reading this document to get a handle on naive bayes. It should link to page 35

https://web.stanford.edu/class/cs124/lec/naivebayes.pdf#page=35

given two documents "a b b" and "c d d" would vocabulary be:

{a,b,b,c,d,d}, |Vocabulary| == 6

or: {a,b,c,d}, |Vocabulary| == 4

just need a sanity check, thanks


Solution

  • The vocabulary would be {a,b,c,d}. Vocabulary refers to the set of terms the classifier is working with, sometimes called the feature space of the classifier.