Search code examples
python-3.xtensorflownlphuggingface-transformers

How to specify the loss function when finetuning a model using the Huggingface TFTrainer Class?


I have followed the basic example as given below, from: https://huggingface.co/transformers/training.html

from transformers import TFBertForSequenceClassification, TFTrainer, TFTrainingArguments

model = TFBertForSequenceClassification.from_pretrained("bert-large-uncased")

training_args = TFTrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=3,              # total # of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    per_device_eval_batch_size=64,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
)

trainer = TFTrainer(
    model=model,                         # the instantiated 🤗 Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=tfds_train_dataset,    # tensorflow_datasets training dataset
    eval_dataset=tfds_test_dataset       # tensorflow_datasets evaluation dataset
)
trainer.train()

But there seems to be no way to specify the loss function for the classifier. For-ex if I finetune on a binary classification problem, I would use

tf.keras.losses.BinaryCrossentropy(from_logits=True)

else I would use

tf.keras.losses.CategoricalCrossentropy(from_logits=True)

My set up is as follows:

transformers==4.3.2
tensorflow==2.3.1
python==3.6.12

Solution

  • Trainer has this capability to use compute_loss

    For more you can look into the documentation:
    https://huggingface.co/docs/transformers/main_classes/trainer#:~:text=passed%20at%20init.-,compute_loss,-%2D%20Computes%20the%20loss

    Here is an example of how to customize Trainer to use a weighted loss (useful when you have an unbalanced training set):

    from torch import nn
    from transformers import Trainer
    
    
    class CustomTrainer(Trainer):
        def compute_loss(self, model, inputs, return_outputs=False):
            labels = inputs.get("labels")
            # forward pass
            outputs = model(**inputs)
            logits = outputs.get("logits")
            # compute custom loss (suppose one has 3 labels with different weights)
            loss_fct = nn.CrossEntropyLoss(weight=torch.tensor([1.0, 2.0, 3.0]))
            loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))
            return (loss, outputs) if return_outputs else loss