Search code examples
wekadata-miningconfusion-matrix

weka confusion matrix and accuracy analysis


How do I analyze the confusion matrix in Weka with regards to the accuracy obtained? We know that accuracy is not accurate because of imbalanced data sets. How does the confusion matrix "confirm" the accuracy?

Examples: a) accuracy 96.1728 %

   a   b   c   d   e   f   g   <-- classified as
 124   0   0   0   1   0   0 |   a = brickface
   0 110   0   0   0   0   0 |   b = sky
   1   0 119   0   2   0   0 |   c = foliage
   1   0   0 107   2   0   0 |   d = cement
   1   0  12   7 105   0   1 |   e = window
   0   0   0   0   0  94   0 |   f = path
   0   0   1   0   0   2 120 |   g = grass

b) accuracy : 96.8 %

a   b   c   d   e   f   g   <-- classified as
 202   0   0   0   3   0   0 |   a = brickface
   0 220   0   0   0   0   0 |   b = sky
   0   0 198   0  10   0   0 |   c = foliage
   0   0   1 202  16   1   0 |   d = cement
   2   0  11   2 189   0   0 |   e = window
   0   0   0   2   0 234   0 |   f = path
   0   0   0   0   0   0 207 |   g = grass

etc...


Solution

  • The accuracy is computed by summing up all instances in the main diagonal and dividing by the total number of instances (the contents of all the confusion matrix). For instance, in a), you get 124 + 110 + ... + 120 = 779, and the total number of instances (summing everything) is 810, so the accuracy is 0,9617 => 96,17%.

    Your datasets are rather balanced (all the classes have approximately the same number of instances). You can see that the dataset is imbalanced when the sum of a row is much bigger than the sume of other rows, as rows represent actual classes. For instance:

    a   b  <-- classified as
    1000 20 | a = class1
    10 10   | b = class2
    

    In this case, class1 has 1020 instances, and class2 has only 20, so the problem is highly imbalanced. This will impact in classifier perfomance, as learning algorithm typically try to maximize the accuracy (or minimize the error), so a trivial classifier like e.g. the rule for any X, set class = class1 will have an accuracy of 1020/1040 = 0,9807.