Search code examples
classificationinformation-retrievalconfusion-matrix

Multi class classifier evaluation


I am reading up on classifiers, particularly multi-class classifier. My question is that when I evaluate the classifier using precision and recall, I don't understand the meaning of False Positive and False Negative in Multi-class classifier evaluation.

For example, when I classify a document (whose real category is C-1), and the classifier classifies it as a C-2. Then, should I increase false positive at C-2 and increase false negative at C-1? (since real answer is C-1.)


Solution

  • Since the example you have given is a two class problem, I am explaining False Positive and False Negative in the context of your example.

    In a 2-class case, the confusion matrix usually looks like following:

           | Declare C-1 | Declare C-2 |
    |Is C-1|    TP       |   FN        |
    |Is C-2|    FP       |   TN        |
    

    where the notations I've used means the following:

    • TP = true positive (classified as C-1 and actually is C-1)
    • FN = false negative (classified as C-2 but actually is C-1)
    • FP = false positive
    • TN = true negative

    From the raw data, the values in the table would typically be the counts for each occurrence over the test data. From this, we can compute the precision, recall and other values accordingly.

    For example, you have a table as follows.

           | Declare C-1 | Declare C-2 |
    |Is C-1|    12       |    6        |
    |Is C-2|     8       |   11        |
    

    The above table represents the following information:

    • 12 documents are classified as C-1 and they actually belong to C-1.
    • 6 documents are classified as C-2 but they actually belong to C-1.
    • 8 documents are classified as C-1 but they actually belong to C-2.
    • 11 documents are classified as C-2 and they actually belong to C-2.

    For category C-1:

    Precision = 12 / (12 + 8)
    Recall = 12 / (12 + 6)
    

    For category C-2:

    Precision = 11 / (11 + 6)
    Recall = 11 / (11 + 8)
    

    For example, when I classify a document (its real category is C-1), classifier classify it as a C-2. Then, should I increase false positive at C-2 and increase false negative at C-1? (since real answer is C-1.)

    You should increase the count for the cell value of the confusion matrix which is associated with Declare C-2 and Is C-1 which is indicated in the following with an *.

           | Declare C-1 | Declare C-2 |
    |Is C-1|     0       |    0*       |
    |Is C-2|     0       |    0        |