Search code examples

Differences in calculations of NB precision / recall avg / total scores

I am conducting text classification analyses and I ran NB based classifiers, that generated following results:

Classification Report:
             precision    recall  f1-score   support

          0       0.00      0.00      0.00         2
          1       0.67      1.00      0.80         4

avg / total       0.44      0.67      0.53         6

Classification Report:
             precision    recall  f1-score   support

          0       0.00      0.00      0.00         0
          1       1.00      0.83      0.91         6

avg / total       1.00      0.83      0.91         6

What puzzles me here is the following issue. Why are avg / total scores calculated differently? Why is the avg / total score in the second table just a copy of precision / recall results for class 1? because there were no class 0 testing instances?




  • Scores calculation is the same in both cases:

    Ex.1: 1) f1 = 2 * 0.67 * 1.00 / (0.67 + 1.00) = 0.80
     average f1 = 2 * 0.44 * 0.67 / (0.44 + 0.67) = 0.53
    Ex.2: 2) f1 = 2 * 1.00 * 0.83 / (1.00 + 0.83) = 0.91
     average f1 = 2 * 1.00 * 0.83 / (1.00 + 0.83) = 0.91

    Them problem you are facing here is called Simpson's paradox: you have one result in different groups (0 and 1) that changes when the groups are combined (average). Check Wiki page, there are a good example and explanation.


    Recall / precision average calculation in first case:

    Av. precision = (0.67 * 4 + 0.00 * 2) / (4 + 2) = 0.44
    Av. recall    = (1.00 * 4 + 0.00 * 2) / (4 + 2) = 0.67