from the docs of caret::confusionMatrix:
positive: an optional character string for the factor level that
corresponds to a "positive" result (if that makes sense for your
data). If there are only two factor levels, the first level will
be used as the "positive" result.
This sounds like it would be possible to define a positive case in a multiclass problem and hence get a classic binary confusion matrix with positive (defined class) vs negative (all the other classes). However, when using the positive attribute on multiclass data, it doesn't change the output of confusionMatrix.
# generate fake data
data = data.frame(measured=as.factor(rep(c('A', 'B', 'C'), c(30,40,30))),
modeled=as.factor(rep(c('A', 'B', 'C', 'A'), c(30,10,20,40))))
# get confusion matrix
matrix = caret::confusionMatrix(data$modeled, dat$measured, positive='A')
gives
Confusion Matrix and Statistics
Reference
Prediction A B C
A 30 10 30
B 0 10 0
C 0 20 0
Overall Statistics
Accuracy : 0.4
95% CI : (0.3033, 0.5028)
No Information Rate : 0.4
P-Value [Acc > NIR] : 0.5379
Kappa : 0.1304
Mcnemar's Test P-Value : 5.878e-13
Statistics by Class:
Class: A Class: B Class: C
Sensitivity 1.0000 0.2500 0.0000
Specificity 0.4286 1.0000 0.7143
Pos Pred Value 0.4286 1.0000 0.0000
Neg Pred Value 1.0000 0.6667 0.6250
Prevalence 0.3000 0.4000 0.3000
Detection Rate 0.3000 0.1000 0.0000
Detection Prevalence 0.7000 0.1000 0.2000
Balanced Accuracy 0.7143 0.6250 0.3571
Did I simply misinterpret the docs or is there really a way to get the binary matrix? I know, that I can produce the desired output myself but if there is a chance to be lazy, I will take it.
Looks like a misinterpretation. It happens that positive
is not used anywhere when there are more than two classes. First caret:::confusionMatrix.default
gets called for some "formalities" and then we go to caret:::confusionMatrix.table
. There positive
gets used multiple times when there are two classes, but nothing outside of that if
case.
As you said, it's not hard to achieve that by hand. For a quick glance you may use simply
table(data.frame(data == "A"))
# modeled
# measured FALSE TRUE
# FALSE 30 40
# TRUE 0 30
where A
and TRUE
correspond to the positive class and FALSE
to everything else.