Search code examples
machine-learningnaivebayes

Is Naive Bayes sensitive to the number of training observations?


I'm using Swift (even if my question is not about language) and Python to test my ML logic. I have training data:

("add a new balloon", "add-balloon")
("add a balloon", "add-balloon")
("get last balloon", "get-balloon")
("update balloon color to red", "update-balloon")

When I try use Naive Bayes to classify a new sentence like

classify("could you add a new balloon") 
// Return add-balloon
classify("could you update the balloon color") 
// Return add-balloon
classify("update the balloon color") 
// Return add-balloon

My data set has a lot of observations about adding a balloon (about 50) but not a lot to update or get (about 5-6). Is Naive Bayes sensitive to the number of training observations? I don't understand why the classification is not performing well even if given a sentence it saw during training.


Solution

  • Naive Bayes is sensitive to class priors (distribution of examples among classes). So if you have way more add-balloon than other categories, it will have a bias towards this class. It is normally helpful since suppose you don't know anything (no posterior information), your best bet is to try the class which is the most likely.

    If your distribution is heavily skewed, you data sets are not large, your documents are short or lack very informative words (or contains many ambiguous ones) though, this can cause undesired results such as what you are reporting.