keras scikit-learn multilabel-classification precision-recall

Is this the correct use of sklearn classification report for multi-label classification reports?

I am training a neural network with tf-keras. It is a multi-label classification where each sample belongs to multiple classes [1,0,1,0..etc] .. the final model line (just for clarity) is:

model.add(tf.keras.layers.Dense(9, activation='sigmoid'))#final layer

model.compile(loss='binary_crossentropy', optimizer=optimizer, 
                metrics=[tf.keras.metrics.BinaryAccuracy(), 
                tfa.metrics.F1Score(num_classes=9, average='macro',threshold=0.5)])

I need to generate precision, recall and F1 scores for these (I already get the F1 score reported during training). For this I am using sklearns classification report, but I need to confirm that I am using it correctly in the multi-label setting.

from sklearn.metrics import classification_report

pred = model.predict(x_test)
pred_one_hot = np.around(pred)#this generates a one hot representation of predictions

print(classification_report(one_hot_ground_truth, pred_one_hot))

This works fine and i get the full report for every class including F1 scores that match the F1score metric from tensorflow addons (for macro F1). Sorry this post is verbose but what I am unsure about is:

Is it correct that the predictions need to be one-hot encoded in the case of the multi-label setting? If I pass in the normal prediction scores (sigmoid probabilities) an error is thrown...

thank you.

Solution

It is correct to use classification_report for both binary, multi-class and multi-label classification.

The labels are not one-hot-encoded in case of multi-class classification. They simply need to be either indices or labels.

You can see that both code below yield the same output:

Example with indices

from sklearn.metrics import classification_report
import numpy as np

labels = np.array(['A', 'B', 'C'])


y_true = np.array([1, 2, 0, 1, 2, 0])
y_pred = np.array([1, 2, 1, 1, 1, 0])
print(classification_report(y_true, y_pred, target_names=labels))

Example with labels

from sklearn.metrics import classification_report
import numpy as np

labels = np.array(['A', 'B', 'C'])

y_true = labels[np.array([1, 2, 0, 1, 2, 0])]
y_pred = labels[np.array([1, 2, 1, 1, 1, 0])]
print(classification_report(y_true, y_pred))

Both returns

              precision    recall  f1-score   support

           A       1.00      0.50      0.67         2
           B       0.50      1.00      0.67         2
           C       1.00      0.50      0.67         2

    accuracy                           0.67         6
   macro avg       0.83      0.67      0.67         6
weighted avg       0.83      0.67      0.67         6

In the context of multi-label classification, classification_report can be used like in the example below:

from sklearn.metrics import classification_report
import numpy as np

labels =['A', 'B', 'C']

y_true = np.array([[1, 0, 1],
                   [0, 1, 0],
                   [1, 1, 1]])
y_pred = np.array([[1, 0, 0],
                   [0, 1, 1],
                   [1, 1, 1]])

print(classification_report(y_true, y_pred, target_names=labels))