Search code examples
scikit-learnnamed-entity-recognition

Why are non-appearing classes shown in the classification report?


I' m working on NER and using sklearn.metrics.classification_report to caculate micro and macro f1 score. It printed a table like:

              precision    recall  f1-score   support

           0     0.0000    0.0000    0.0000         0
           3     0.0000    0.0000    0.0000         0
           4     0.8788    0.9027    0.8906       257
           5     0.9748    0.9555    0.9650      1617
           6     0.9862    0.9888    0.9875      1156
           7     0.9339    0.9138    0.9237       835
           8     0.8542    0.7593    0.8039       216
           9     0.8945    0.8575    0.8756       702
          10     0.9428    0.9382    0.9405      1668
          11     0.9234    0.9139    0.9186      1661

    accuracy                         0.9285      8112
   macro avg     0.7388    0.7230    0.7305      8112
weighted avg     0.9419    0.9285    0.9350      8112

It's obvious that the predicted labels have '0' or '3', but there's no '0' or '3' in true labels. Why the classification report will show these two classes which don't have any samples? And how to do to prevent "0-support" classes from being shown. It seems that these two classes have a great impact to macro f1 score.


Solution

  • You can use the following snippet to ensure that all labels in the classification report are present in y_true labels:

    from sklearn.metrics import classification_report
    y_true = [0, 1, 2, 2, 2, 2]
    y_pred = [0, 0, 2, 2, 1, 42]
    print(classification_report(y_true, y_pred, labels=np.unique(y_true)))
    

    Which output:

                  precision    recall  f1-score   support
    
               0       0.50      1.00      0.67         1
               1       0.00      0.00      0.00         1
               2       1.00      0.50      0.67         4
    
       micro avg       0.60      0.50      0.55         6
       macro avg       0.50      0.50      0.44         6
    weighted avg       0.75      0.50      0.56         6
    

    As you see the label 42 present in the prediction is not shown as it has no support in y_true.