I am working with a confusion matrix and have a very basic understanding of the output. However, as I am new to using this, and R, the details explainations often make it sound more complicated. I have the below output and I am just wondering if it could be explained to me
Whats the difference between the accuracy & kappa?
> confusionMatrix(predRF, loanTest2$grade)
Confusion Matrix and Statistics
Reference
Prediction A B C D E F G
A 2298 174 63 29 26 12 3
B 264 3245 301 65 16 3 3
C 5 193 2958 399 61 15 4
D 1 1 39 1074 236 33 6
E 0 0 2 32 249 97 30
F 0 0 0 0 8 21 11
G 0 0 0 0 0 0 0
Overall Statistics
Accuracy : 0.822
95% CI : (0.815, 0.8288)
No Information Rate : 0.3017
P-Value [Acc > NIR] : < 2.2e-16
Kappa: 0.7635
Class: A Class: B Class: C Class: D Class: E Class: F Class: G
Sensitivity 0.8949 0.8981 0.8796 0.67167 0.41779 0.116022 0.000000
Specificity 0.9674 0.9220 0.9214 0.96955 0.98585 0.998389 1.000000
Pos Pred Value 0.8821 0.8327 0.8138 0.77266 0.60732 0.525000 NaN
Neg Pred Value 0.9712 0.9545 0.9515 0.95041 0.97000 0.986596 0.995241
Prevalence 0.2144 0.3017 0.2808 0.13351 0.04976 0.015112 0.004759
Detection Rate 0.1919 0.2709 0.2470 0.08967 0.02079 0.001753 0.000000
Detection Prevalence 0.2175 0.3254 0.3035 0.11606 0.03423 0.003340 0.000000
Balanced Accuracy 0.9311 0.9101 0.9005 0.82061 0.70182 0.557206 0.500000
Let's say this is your confusion matrix:
tab = structure(list(A = c(2298L, 264L, 5L, 1L, 0L, 0L, 0L), B = c(174L,
3245L, 193L, 1L, 0L, 0L, 0L), C = c(63L, 301L, 2958L, 39L, 2L,
0L, 0L), D = c(29L, 65L, 399L, 1074L, 32L, 0L, 0L), E = c(26L,
16L, 61L, 236L, 249L, 8L, 0L), F = c(12L, 3L, 15L, 33L, 97L,
21L, 0L), G = c(3L, 3L, 4L, 6L, 30L, 11L, 0L)), class = "data.frame", row.names = c("A",
"B", "C", "D", "E", "F", "G"))
You need to go by each label, for example for class A, those terms make sense in terms of predictions with respect to A.
A_confusion_matrix = cbind(c(x[1,1],sum(x[-1,1])),c(sum(x[1,-1]),sum(x[2:7,2:7])))
[,1] [,2]
[1,] 2298 307
[2,] 270 9102
How the above is calculated is basically lumping all the predictions and references are incorrect and not A together.
And these numbers represent:
A_confusion_matrix[1,1] is number that are predicted A and truly A -> TP
A_confusion_matrix[1,2] is the number that are predicted A but not A -> FP
A_confusion_matrix[2,1] is the number that are not predicted A but A -> FN
A_confusion_matrix[2,2] is the number that are not predicted A and not A -> TN
From here you can for example calculate sensitivity for A, which is TP/(TP+FN) = 2298/(2298+270) = 0.8948598
It is cohen's kappa, basically a metric that measures how good your predictions are compared to random guessing / assignment.
As you can see from the above formula, it makes a huge difference when your dataset is imbalanced. For example, if 90% of your labels are one class, if the model predicts everything to be that class you get 90% Accuracy. However if you use cohen's kappa, p expected is 0.9 to start with and you need to go better than that to show a good score.