How do I analyze the confusion matrix in Weka with regards to the accuracy obtained? We know that accuracy is not accurate because of imbalanced data sets. How does the confusion matrix "confirm" the accuracy?
Examples: a) accuracy 96.1728 %
a b c d e f g <-- classified as
124 0 0 0 1 0 0 | a = brickface
0 110 0 0 0 0 0 | b = sky
1 0 119 0 2 0 0 | c = foliage
1 0 0 107 2 0 0 | d = cement
1 0 12 7 105 0 1 | e = window
0 0 0 0 0 94 0 | f = path
0 0 1 0 0 2 120 | g = grass
b) accuracy : 96.8 %
a b c d e f g <-- classified as
202 0 0 0 3 0 0 | a = brickface
0 220 0 0 0 0 0 | b = sky
0 0 198 0 10 0 0 | c = foliage
0 0 1 202 16 1 0 | d = cement
2 0 11 2 189 0 0 | e = window
0 0 0 2 0 234 0 | f = path
0 0 0 0 0 0 207 | g = grass
etc...
The accuracy is computed by summing up all instances in the main diagonal and dividing by the total number of instances (the contents of all the confusion matrix). For instance, in a), you get 124 + 110 + ... + 120 = 779
, and the total number of instances (summing everything) is 810
, so the accuracy is 0,9617 => 96,17%
.
Your datasets are rather balanced (all the classes have approximately the same number of instances). You can see that the dataset is imbalanced when the sum of a row is much bigger than the sume of other rows, as rows represent actual classes. For instance:
a b <-- classified as
1000 20 | a = class1
10 10 | b = class2
In this case, class1
has 1020 instances, and class2
has only 20, so the problem is highly imbalanced. This will impact in classifier perfomance, as learning algorithm typically try to maximize the accuracy (or minimize the error), so a trivial classifier like e.g. the rule for any X, set class = class1
will have an accuracy of 1020/1040 = 0,9807
.