Search code examples
huggingface-transformers

How can I get metrics per label displayed in the transformers trainer?


How can I get the appropriate metric (accuracy, F1 etc.) for each label?

I use the trainer from Transformers. https://huggingface.co/docs/transformers/main_classes/trainer

I would like to have an output similar to the sklearn.metrics.classification_report

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html

Thanks for your help!


Solution

  • You can print the sklear classification report during the training phase, by adjusting the compute_metrics() function and pass it to the trainer. For a little demo you can change the function in the official huggingface example to the following:

    from sklearn.metrics import classification_report
    
    
    def compute_metrics(eval_pred):
        predictions, labels = eval_pred
        if task != "stsb":
            predictions = np.argmax(predictions, axis=1)
        else:
            predictions = predictions[:, 0]
    
        print(classification_report(labels, predictions))
        return metric.compute(predictions=predictions, references=labels)
    

    After each epoch you get the following output:

                  precision    recall  f1-score   support
    
               0       0.76      0.36      0.49       322
               1       0.77      0.95      0.85       721
    
        accuracy                           0.77      1043
       macro avg       0.77      0.66      0.67      1043
    weighted avg       0.77      0.77      0.74      1043
    

    For a more fine grained control during your training phase, you can also define callback to customise the behaviour of the training loop during different states.

    class PrintClassificationCallback(TrainerCallback):
        def on_evaluate(self, args, state, control, logs=None, **kwargs):
            print("Called after evaluation phase")
    
    trainer = Trainer(
        model,
        args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        callbacks=[PrintClassificationCallback]
    )
    

    After your training phase you can also use your trained model in a classification pipeline to pass one or more samples to your model and get the corresponding prediction labels. For example

    from transformers import pipeline
    from sklearn.metrics import classification_report
    
    
    text_classification_pipeline = pipeline("text-classification", model="MyFinetunedModel")
    
    X = [ "This is a cat sentence", "This is a dog sentence", "This is a fish sentence"]
    y_act = ["LABEL_1", "LABEL_2", "LABEL_3"]
    labels = ["LABEL_1", "LABEL_2", "LABEL_3"]
    
    y_pred = [result["label"] for result in text_classification_pipeline(X)]
    
    print(classification_report(y_pred, y_act, labels=labels))
    

    Output:

                  precision    recall  f1-score   support
    
         LABEL_1       1.00      0.33      0.50         3
         LABEL_2       0.00      0.00      0.00         0
         LABEL_3       0.00      0.00      0.00         0
    
        accuracy                           0.33         3
       macro avg       0.33      0.11      0.17         3
    weighted avg       1.00      0.33      0.50         3
    

    Hope it helps.