Search code examples
rclassificationr-caret

caret package confusion matrix define positive case with multiple classes


from the docs of caret::confusionMatrix:

positive: an optional character string for the factor level that
corresponds to a "positive" result (if that makes sense for your
data). If there are only two factor levels, the first level will
be used as the "positive" result.

This sounds like it would be possible to define a positive case in a multiclass problem and hence get a classic binary confusion matrix with positive (defined class) vs negative (all the other classes). However, when using the positive attribute on multiclass data, it doesn't change the output of confusionMatrix.

# generate fake data
data = data.frame(measured=as.factor(rep(c('A', 'B', 'C'), c(30,40,30))),
    modeled=as.factor(rep(c('A', 'B', 'C', 'A'), c(30,10,20,40))))

# get confusion matrix
matrix = caret::confusionMatrix(data$modeled, dat$measured, positive='A')

gives

Confusion Matrix and Statistics

          Reference
Prediction  A  B  C
         A 30 10 30
         B  0 10  0
         C  0 20  0

Overall Statistics

               Accuracy : 0.4             
                 95% CI : (0.3033, 0.5028)
    No Information Rate : 0.4             
    P-Value [Acc > NIR] : 0.5379          

                  Kappa : 0.1304          
 Mcnemar's Test P-Value : 5.878e-13       

Statistics by Class:

                     Class: A Class: B Class: C
Sensitivity            1.0000   0.2500   0.0000
Specificity            0.4286   1.0000   0.7143
Pos Pred Value         0.4286   1.0000   0.0000
Neg Pred Value         1.0000   0.6667   0.6250
Prevalence             0.3000   0.4000   0.3000
Detection Rate         0.3000   0.1000   0.0000
Detection Prevalence   0.7000   0.1000   0.2000
Balanced Accuracy      0.7143   0.6250   0.3571

Did I simply misinterpret the docs or is there really a way to get the binary matrix? I know, that I can produce the desired output myself but if there is a chance to be lazy, I will take it.


Solution

  • Looks like a misinterpretation. It happens that positive is not used anywhere when there are more than two classes. First caret:::confusionMatrix.default gets called for some "formalities" and then we go to caret:::confusionMatrix.table. There positive gets used multiple times when there are two classes, but nothing outside of that if case.

    As you said, it's not hard to achieve that by hand. For a quick glance you may use simply

    table(data.frame(data == "A"))
    #         modeled
    # measured FALSE TRUE
    #    FALSE    30   40
    #    TRUE      0   30
    

    where A and TRUE correspond to the positive class and FALSE to everything else.