How can I get metrics per label displayed in the transformers trainer?

How can I get the appropriate metric (accuracy, F1 etc.) for each label?

I use the trainer from Transformers. https://huggingface.co/docs/transformers/main_classes/trainer

I would like to have an output similar to the sklearn.metrics.classification_report

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html

Thanks for your help!

Solution

You can print the sklear classification report during the training phase, by adjusting the compute_metrics() function and pass it to the trainer. For a little demo you can change the function in the official huggingface example to the following:

from sklearn.metrics import classification_report


def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    if task != "stsb":
        predictions = np.argmax(predictions, axis=1)
    else:
        predictions = predictions[:, 0]

    print(classification_report(labels, predictions))
    return metric.compute(predictions=predictions, references=labels)

After each epoch you get the following output:

              precision    recall  f1-score   support

           0       0.76      0.36      0.49       322
           1       0.77      0.95      0.85       721

    accuracy                           0.77      1043
   macro avg       0.77      0.66      0.67      1043
weighted avg       0.77      0.77      0.74      1043

For a more fine grained control during your training phase, you can also define callback to customise the behaviour of the training loop during different states.

class PrintClassificationCallback(TrainerCallback):
    def on_evaluate(self, args, state, control, logs=None, **kwargs):
        print("Called after evaluation phase")

trainer = Trainer(
    model,
    args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    callbacks=[PrintClassificationCallback]
)

After your training phase you can also use your trained model in a classification pipeline to pass one or more samples to your model and get the corresponding prediction labels. For example

from transformers import pipeline
from sklearn.metrics import classification_report


text_classification_pipeline = pipeline("text-classification", model="MyFinetunedModel")

X = [ "This is a cat sentence", "This is a dog sentence", "This is a fish sentence"]
y_act = ["LABEL_1", "LABEL_2", "LABEL_3"]
labels = ["LABEL_1", "LABEL_2", "LABEL_3"]

y_pred = [result["label"] for result in text_classification_pipeline(X)]

print(classification_report(y_pred, y_act, labels=labels))

Output:

              precision    recall  f1-score   support

     LABEL_1       1.00      0.33      0.50         3
     LABEL_2       0.00      0.00      0.00         0
     LABEL_3       0.00      0.00      0.00         0

    accuracy                           0.33         3
   macro avg       0.33      0.11      0.17         3
weighted avg       1.00      0.33      0.50         3

Hope it helps.