I am attempting to learn Naive Bayes Gaussian machine learning algorithm by programming the algorithm by myself.
I notice in my implementation that the total of the final prediction probabilities for all the labels is not 1.0. In fact, all of my prediction probabilities are very small numbers, like 0.00000000000184 size. However, selecting the max size from them gives me highly accurate predictions.
So I am trying to get them to add up to 1, and I think the failure is because I only use the Prior and Likelihood calculations. I ignore the Normalizer in the denominator.
Here is an example from my dataset. Column 0-4 are my attributes and column 4 is my labels.
So, I'm trying to include the Normalizer. But I can't figure out how. My labels column is categorical, so I know how to to calculate the P(y) Prior probability that any categorical value will have.
But my attributes are real values, so how can you calculate the P(X) Prior probability for any such real attribute? Here is the formula I'm trying to implement. I am unable to understand the calculation for the denominator.
If you managed to calculate the numerator for each class y
: P(x|y)P(y)
.
Then the denominator P(X)
is simply the sum of these over all classes y: P(X)=sum_y ( P(x|y)P(y) )
Note that the NB algorithm is about choosing the class y for which P(x|y)P(y)
will be the highest. dividing each of these results by P(X) will not change the conclusion, so it might not be worth it calculating it.
Now it sounds you have an issue with calculating P(x|y)
.
This is where the modeling of probability comes in.
One possibility is to say that for each class y, the x observed are all independent (that is what "naive" means) and gaussian.
so the probability P(x/y) = product_i(f(x_i,m_y,s_y))
with x_i
each of the 4 observations of x
on a row, f the density of gaussian, and m_y
and s_y
the mean and standard deviation for the class y.