What should be taken as m in m estimate of probability in Naive Bayes?
So for this example
what m value should I take? Can I take it to be 1.
Here p=prior probabilities=0.5.
So can I take P(a_i|selected)=(n_c+ 0.5)/ (3+1)
For Naive Bayes text classification the given P(W|V)=
In the book it says that this is adopted from the m-estimate by letting uniform priors and with m equal to the size of the vocabulary.
But if we have only 2 classes then p=0.5. So how can mp
be 1? Shouldn't it be |vocabulary|*0.5
? How is this equation obtained from m-estimate?
In calculating the probabilities for attribute profession,As the prior probabilities are 0.5 and taking m=1
P(teacher|selected)=(2+0.5)/(3+1)=5/8
P(farmer|selected)=(1+0.5)/(3+1)=3/8
P(Business|Selected)=(0+0.5)/(3+1)= 1/8
But shouldn't the class probabilities add up to 1? In this case it is not.
"m estimate of probability" is confusing.
In the given examples, m and p should be like this.
m = 3 (* this could be any value. you can specify this.)
p = 1/3 = |v| (* number of unique values in the feature)
If you use m=|v| then m*p=1,
so it is called Laplace smoothing. "m estimate of probability" is the generalized version of Laplace smoothing.
In the above example you may think m=3 is too much, then you can reduce m to 0.2 like this.