Search code examples
rclassificationdata-miningj48

Zero-R model calculation of Sensitivity and Specificity using Confusion Matrix and Statistics with Caret


Here are my results from the confusionMatrix() function in R, this is based on a Zero-R model. I may have setup the function incorrectly, according to its results there's a mismatch between what I manually got as the answer varied by randomized seeds and the confusionMatrix() function's answer of sensitivity just being 1.0000:

> sensitivity1 = 213/(213+128)
> sensitivity2 = 211/(211+130)
> sensitivity3 = 215/(215+126)
> #specificity = 0/(0+0) there were no other predictions
> specificity = 0
> specificity
[1] 0
> sensitivity1
[1] 0.6246334
> sensitivity2
[1] 0.6187683
> sensitivity3
[1] 0.6304985

There is a warning message but it does look like it still runs and refactors the data to match because it wasn't in the same order, this may be based on train and test ordering and randomization. I attempted to go back and make sure the train and test didn't have reverse ordering with the negative sign, or different numbers of rows. Here's the results from caret's confusionMatrix() function:

> confusionMatrix(as.factor(testDiagnosisPred), as.factor(testDiagnosis), positive="B") 
Confusion Matrix and Statistics

          Reference
Prediction   B   M
         B 211 130
         M   0   0
                                          
               Accuracy : 0.6188          
                 95% CI : (0.5649, 0.6706)
    No Information Rate : 0.6188          
    P-Value [Acc > NIR] : 0.524           
                                          
                  Kappa : 0               
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 1.0000          
            Specificity : 0.0000          
         Pos Pred Value : 0.6188          
         Neg Pred Value :    NaN          
             Prevalence : 0.6188          
         Detection Rate : 0.6188          
   Detection Prevalence : 1.0000          
      Balanced Accuracy : 0.5000          
                                          
       'Positive' Class : B               
                                          
Warning message:
In confusionMatrix.default(as.factor(testDiagnosisPred), as.factor(testDiagnosis),  :
  Levels are not in the same order for reference and data. Refactoring data to match.

The testDiagnosisPred just shows that it guesses Benign (B) as the diagnosis for every cancer test in the data set, these vary based on seed because actual Benign (B) and Malignant (M) results get randomized each time.

testDiagnosisPred
  B 
341 
> ## testDiagnosisPred
> ##   B 
> ## 228
> 
> majorityClass # confusion matrix

  B   M 
211 130 
> ## 
> ##   B   M 
> ## 213 128
> 
> # another seed's confusion matrix
> ## B   M 
> ## 211 130 

Here's what some of the data looks like using the head() and str() functions:

> head(testDiagnosisPred)
[1] "B" "B" "B" "B" "B" "B"
> head(cancerdata.train$Diagnosis)
[1] "B" "B" "M" "M" "M" "B"
> head(testDiagnosis)
[1] "B" "B" "M" "M" "M" "B"
> 
> str(testDiagnosisPred)
 chr [1:341] "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" ...
> str(cancerdata.train$Diagnosis)
 chr [1:341] "B" "B" "M" "M" "M" "B" "B" "B" "M" "M" "M" "B" "M" "M" "B" "M" "B" "B" "B" "M" "B" "B" "B" "B" ...
> str(testDiagnosis)
 chr [1:341] "B" "B" "M" "M" "M" "B" "B" "B" "M" "M" "M" "B" "M" "M" "B" "M" "B" "B" "B" "M" "B" "B" "B" "B" ...
> 

Solution

  • The confusion with the confusion matrix and the calculations of specificity and sensitivity occurred because of misreading the confusion matrix horizontally instead of vertically, the correct answer comes from the confusionMatrix() function in caret, another way of knowing this is that it's a ZeroR model and upon further investigation it's just always 1.00 sensitivity and 0.00 specificity! That's because the ZeroR model uses zero rules and zero attributes, just gives a majority prediction.

    > confusionMatrix(as.factor(testDiagnosisPred), as.factor(testDiagnosis), positive="B") 
    Confusion Matrix and Statistics
    
              Reference
    Prediction   B   M
             B 211 130
             M   0   0
                                              
                   Accuracy : 0.6188                  
                                              
                Sensitivity : 1.0000          
                Specificity : 0.0000 
    

    When I did these manual specificity and sensitivity calculations I misread the confusion matrix horizontally instead of vertically:

    > sensitivity1 = 213/(213+128)
    > sensitivity2 = 211/(211+130)
    > sensitivity3 = 215/(215+126)
    > #specificity = 0/(0+0) there were no other predictions
    > specificity = 0
    > specificity
    [1] 0
    > sensitivity1
    [1] 0.6246334
    > sensitivity2
    [1] 0.6187683
    > sensitivity3
    [1] 0.6304985