How can I get the appropriate metric (accuracy, F1 etc.) for each label?
I use the trainer from Transformers. https://huggingface.co/docs/transformers/main_classes/trainer
I would like to have an output similar to the sklearn.metrics.classification_report
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html
Thanks for your help!
You can print the sklear classification report during the training phase, by adjusting the compute_metrics()
function and pass it to the trainer. For a little demo you can change the function in the official huggingface example to the following:
from sklearn.metrics import classification_report
def compute_metrics(eval_pred):
predictions, labels = eval_pred
if task != "stsb":
predictions = np.argmax(predictions, axis=1)
else:
predictions = predictions[:, 0]
print(classification_report(labels, predictions))
return metric.compute(predictions=predictions, references=labels)
After each epoch you get the following output:
precision recall f1-score support
0 0.76 0.36 0.49 322
1 0.77 0.95 0.85 721
accuracy 0.77 1043
macro avg 0.77 0.66 0.67 1043
weighted avg 0.77 0.77 0.74 1043
For a more fine grained control during your training phase, you can also define callback to customise the behaviour of the training loop during different states.
class PrintClassificationCallback(TrainerCallback):
def on_evaluate(self, args, state, control, logs=None, **kwargs):
print("Called after evaluation phase")
trainer = Trainer(
model,
args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
callbacks=[PrintClassificationCallback]
)
After your training phase you can also use your trained model in a classification pipeline to pass one or more samples to your model and get the corresponding prediction labels. For example
from transformers import pipeline
from sklearn.metrics import classification_report
text_classification_pipeline = pipeline("text-classification", model="MyFinetunedModel")
X = [ "This is a cat sentence", "This is a dog sentence", "This is a fish sentence"]
y_act = ["LABEL_1", "LABEL_2", "LABEL_3"]
labels = ["LABEL_1", "LABEL_2", "LABEL_3"]
y_pred = [result["label"] for result in text_classification_pipeline(X)]
print(classification_report(y_pred, y_act, labels=labels))
Output:
precision recall f1-score support
LABEL_1 1.00 0.33 0.50 3
LABEL_2 0.00 0.00 0.00 0
LABEL_3 0.00 0.00 0.00 0
accuracy 0.33 3
macro avg 0.33 0.11 0.17 3
weighted avg 1.00 0.33 0.50 3
Hope it helps.