Below is the training dataset that I am using for Naive Bayes implementation in R(using e1071 package) where: X,Y,Z are the different classes and V1,V2,V3,V4,V5 are the attributes:-
Class V1 V2 V3 V4 V5
X Yes Yes No Yes Yes
X Yes Yes No No Yes
X Yes Yes No No Yes
X Yes Yes No No Yes
X No Yes No No Yes
X No Yes No No Yes
X No Yes No No Yes
X No No No No No
X No No No No No
X No No No No No
X No No No No No
X No No No No No
X No No No No No
X No No No No No
X No No No No No
X No No No No No
Y Yes Yes Yes No Yes
Y No No No No Yes
Y No No No No Yes
Y No No No No No
Y No No No No No
Y No No No No No
Y No No No No No
Z No Yes Yes No Yes
Z No No No No Yes
Z No No No No Yes
Z No No No No No
Z No No No No No
Z No No No No No
Z No No No No No
The prior probabilities for the above dataset are X->0.5333333 Y->0.2333333 Z->0.2333333
and the conditional probabilities are :-
V1
Y No Yes
X 0.7500000 0.2500000
Y 0.8571429 0.1428571
Z 1.0000000 0.0000000
V2
Y No Yes
X 0.5625000 0.4375000
Y 0.8571429 0.1428571
Z 0.8571429 0.1428571
V3
Y No Yes
X 1.0000000 0.0000000
Y 0.8571429 0.1428571
Z 0.8571429 0.1428571
V4
Y No Yes
X 0.9375 0.0625
Y 1.0000 0.0000
Z 1.0000 0.0000
V5
Y No Yes
X 0.5625000 0.4375000
Y 0.5714286 0.4285714
Z 0.5714286 0.4285714
Case 1:- Laplace smoothing not used
I want to find out in which class does V3 belong to, given value Yes. So I have my test data as :-
V3
Yes
So, I have to find out probability of each class ie, Probability(X| V3=Yes), Probability(Y| V3=Yes),Probability(Z| V3=Yes) and take the maximum out of the three. Now,
Probability(X| V3=Yes)= Probability(X) * Probability(V3=Yes|X)/ P(V3)
From the conditional probability mentioned above, we know that Probability(V3=Yes|X)=0 So, Probability(X| V3=Yes) should be 0 and Probability(Y| V3=Yes),Probability(Z| V3=Yes) should be 0.5 each.
But in R output is different. From the package e1071 I have used naiveBayes function. Below is the code and its corresponding output:-
#model_nb<-naiveBayes(Class~.,data = train,laplace=0)
#results<-predict(model_nb,test,type = "raw")
#print(results)
# X Y Z
#[1,] 0.5714286 0.2142857 0.2142857
Can someone please explain as to why such is the output in R?
Case 2:- Laplace smoothing used
Same scenario as Case1 w.r.t. Test Data, only difference being laplace used is 1. So, again I have to find out probability of each class ie, Probability(X| V3=Yes), Probability(Y| V3=Yes),Probability(Z| V3=Yes) and take the maximum out of the three.
Below are the conditional probabilities after laplace smoothing(k=1)
V1
Y No Yes
X 0.7222222 0.2777778
Y 0.7777778 0.2222222
Z 0.8888889 0.1111111
V2
Y No Yes
X 0.5555556 0.4444444
Y 0.7777778 0.2222222
Z 0.7777778 0.2222222
V3
Y No Yes
X 0.94444444 0.05555556
Y 0.77777778 0.22222222
Z 0.77777778 0.22222222
V4
Y No Yes
X 0.8888889 0.1111111
Y 0.8888889 0.1111111
Z 0.8888889 0.1111111
V5
Y No Yes
X 0.5555556 0.4444444
Y 0.5555556 0.4444444
Z 0.5555556 0.4444444
From naive bayes definition,
Probability(X| V3=Yes)= Probability(X) * Probability(V3=Yes|X)/ P(V3)
Probability(Y| V3=Yes)= Probability(Y) * Probability(V3=Yes|X)/ P(V3)
Probability(Z| V3=Yes)= Probability(Z) * Probability(V3=Yes|X)/ P(V3)
After Calculation I have,
Probability(X| V3=Yes)= 0.53 * 0.05555556 / P(V3)=0.029/P(V3)
Probability(Y| V3=Yes)= 0.23 * 0.22222222 / P(V3)=0.051/P(V3)
Probability(Z| V3=Yes)= 0.23 * 0.22222222 / P(V3)=0.051/P(V3)
From the above calculation, there should be a tie between class Y and Z. But in R output is different. Class X is being shown as output class. Below is the code and its corresponding output:-
#model_nb<-naiveBayes(Class~.,data = train,laplace=1)
#results<-predict(model_nb,test,type = "raw")
#print(results)
# X Y Z
#[1,] 0.5811966 0.2094017 0.2094017
Again, can someone please explain why is such the output in R? Am I going wrong anywhere with my calculation?
Also, need some explanation on how P(V3) would be calculated when laplace smoothing is done.
Thanks in advance!
The problem is that you are using just one sample for the test dataset, with only one value of V3
. If you give a bit more test data you get sensible/expected results (focusing only on your case 1):
test <- data.frame(V3=c("Yes", "No"))
predict(model_nb, test, type="raw")
X Y Z
[1,] 0.007936508 0.4960317 0.4960317
[2,] 0.571428571 0.2142857 0.2142857
Note you don't get exactly 0, 0.5, 0.5 for V3="Yes", since the function is using a threshold -which you can adjust, do ?predict.naiveBayes
for more info.
The problem is actually due to the internal implementation of predict.naiveBayes
(the source code is at CRAN repository). I'm not going to go into all the details, but basically I've debugged the function, and in a certain step there is this line,
newdata <- data.matrix(newdata)
which will later decide which column of the conditional probabilities to use. With your original data the data.matrix looks like this:
data.matrix(data.frame(V3="Yes"))
V3
[1,] 1
thus it later assumes that the conditional probabilities were to be taken from column 1, i.e values 1.0000000, 0.8571429 and 0.8571429 for V3="No", and that's why you were getting results as if V3 was actually "No".
However,
data.matrix(data.frame(V3=c("Yes", "No")))
V3
[1,] 2
[2,] 1
gives column 2 of the conditional probabilities when V3 is "Yes", and thus you get the right result.
I'm pretty sure your case 2 is just analogous.
Hope it helps.
EDIT after comments: I guess the easier way to solve it would be to put all the data in one data.frame, and select the indexes you use for training/testing your model. Many functions accept subset
to select the data you use for training, and naiveBayes
is no exception. However, for predict.naiveBayes
you have to select the index. Something like this.
all_data <- rbind(train, c(NA, NA, NA, "Yes", NA, NA))
trainIndex <- 1:30
model_nb <- naiveBayes(Class~., data=all_data, laplace=0, subset=trainIndex)
predict(model_nb, all_data[-trainIndex,], type="raw")
gives the expected result.
X Y Z
[1,] 0.007936508 0.4960317 0.4960317
Note that this works because in this case when you do the data.matrix
operation you get the right result.
data.matrix(all_data[-trainIndex,])
Class V1 V2 V3 V4 V5
31 NA NA NA 2 NA NA
EDIT2 after comments: Some more details on why this is happening.
When you define your test
dataframe including only one value equal to "No", the conversion performed by data.matrix
has actually no way to know that your variable V3
has 2 possible values, "Yes" and "No". test$V3
is actually a factor:
test <- data.frame(V3="Yes")
class(test$V3)
[1] "factor"
and as said it has only one level (no way for the data.frame to know there are actually 2)
levels(test$V3)
[1] "Yes"
The implementation of data.matrix
, as you can see in the docs, uses the levels of the factor:
Factors and ordered factors are replaced by their internal codes.
Thus when converting test to data.matrix
it interprets there's only one possible value of the factor and decodes it,
data.matrix(test)
V3
[1,] 1
However, when you do the trick of putting training and test into the same dataframe, the factor levels are properly defined.
levels(all_data$V3)
[1] "No" "Yes"
The result would be the same if you did this:
test <- data.frame(V3=factor("Yes", levels=levels(all_data$V3)))
test
V3
1 Yes
levels(test$V3)
[1] "No" "Yes"
data.matrix(test)
V3
[1,] 2