Interpreting conditional probabilities returned by naiveBayes classifier in e1071:R

Working on a classification solution using the following process:

a. Perform Naive Bayes classification in R using e1071.

b. Get the a-priori table and conditional probabilities tables

c. Use the values for prediction using a PL/SQL program within an application. i.e. Eventual prediction will not involve usage of the R predict function.

In step b, am seeing negative and greater than 1 conditional probabilities returned by R after model generation - are they really conditional probabilities?

Illustrating the issue with 2 data sets - one that I am able to interpret and one that I am unable to interpret.

Data set 1: Fruit identification ( saw this in a nice Naive Bayes illustration in this forum)

Data Frame Fruit_All: 

Long    Sweet   Yellow  Fruit

Yes Yes Yes Banana

Yes Yes Yes Banana

Yes Yes Yes Banana

Yes Yes Yes Banana

No  Yes Yes Banana

No  Yes Yes Orange

No  Yes Yes Orange

No  Yes Yes Orange

Yes Yes Yes Other

No  Yes No  Other

Yes Yes Yes Banana

Yes Yes Yes Banana

Yes No  Yes Banana

Yes No  No  Banana

No  No  Yes Banana

No  No  Yes Orange

No  No  Yes Orange

No  No  Yes Orange

Yes Yes No  Other

No  No  No  Other

Performing Naive Bayes classification:

  `NB.fit <- naiveBayes(Fruit~., data=Fruit_All,laplace=0)`

where Fruit is the class column, Fruit_All is the complete data frame.

The returned conditional probabilities in NB.fit are exactly as expected.

Also, all the row probabilities neatly add up to 1. e.g.0.1 + 0.9 for Banana+Yellow

Conditional probabilities:

        Long        
Y         No Yes        
  Banana 0.2 0.8        
  Orange 1.0 0.0        
  Other  0.5 0.5        

        Sweet       
Y          No  Yes      
  Banana 0.30 0.70      
  Orange 0.50 0.50      
  Other  0.25 0.75      

        Yellow      
Y          No  Yes      
  Banana 0.10 0.90      
  Orange 0.00 1.00      
  Other  0.75 0.25      

A-priori probabilities:         

Banana Orange  Other            
   0.5    0.3    0.2

I can use the above to easily write code to predict the outcome for an input provided e.g. For Long, Sweet and Yellow all equal to yes.

The fruit for which this product is maximum :

P(Long|Fruit) * P(Sweet|Fruit) * P(Yellow|Fruit) * apriori P(Fruit)

Data Set 2: Iris data set available in R

  `NB.fit <- naiveBayes(Species~., data=iris)`

Conditional probabilities:

         Sepal.Length
Y             [,1]      [,2]

  setosa     5.006 0.3524897

  versicolor 5.936 0.5161711

  virginica  6.588 0.6358796

            Sepal.Width
Y             [,1]      [,2]

  setosa     3.428 0.3790644

  versicolor 2.770 0.3137983

  virginica  2.974 0.3224966

            Petal.Length
Y             [,1]      [,2]

  setosa     1.462 0.1736640

  versicolor 4.260 0.4699110

  virginica  5.552 0.5518947

            Petal.Width
Y             [,1]      [,2]

  setosa     0.246 0.1053856

  versicolor 1.326 0.1977527

  virginica  2.026 0.2746501

In this case, the same function doesn't seem to be returning conditional probabilities as some of the values are greater than 1 and none of the rows add up to 1.

Note: If I use the predict function in R , I get correct results as predictions for Iris.

I understand the Iris data set is a bit different as the variables are continuous numeric values and not factors unlike the fruit example.

For other complex data sets, I even see negative values as conditional probabilities returned by the classifier. Though the final result is fine within R.

Questions:

Are the conditional probabilities returned for the Iris data set really conditional probabilities?

Will the same product maximization I did in the fruit example hold good for Iris and even for data sets where the conditional probabilities are negative?

Is it possible to write a custom prediction function based on the Iris conditional probability tables?

Solution

This answer is just about a year late but I just stumbled upon it. As you write, the predictors are numeric and are therefore treated differently that factors. What you get are the means (first columns) and sd's (second column) of the conditional Gaussian distributions. Thus, for

            Petal.Width
Y             [,1]      [,2]

  setosa     0.246 0.1053856

We have that the mean Petal Width is 0.246 and the standard deviation is 0.10. You can see that too from

> iris %>% dplyr::filter(Species=="setosa") %>% 
           dplyr::summarize(mean(Petal.Width), sd(Petal.Width))
  mean(Petal.Width) sd(Petal.Width)
1             0.246       0.1053856

The Gaussian density is used to invert the conditional probability using Bayes formula to obtain the proper conditional probabilities.