machine-learning pytorch huggingface-transformers huggingface

How do a put a different classifier on top of BertForSequenceClassification?

I have a huggingface model:

model_name = 'bert-base-uncased'
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=1).to(device)

How can I change the default classifier head? Since it's only a single LinearClassifier. I found this issue in the huggingface github which said:

You can also replace self.classifier with your own model.
model = BertForSequenceClassification.from_pretrained("bert-base-multilingual-cased")
model.classifier = new_classifier
where new_classifier is any pytorch model that you want.

However, I can't figure out how the structure of the new_classifier should look like (in particular the inputs and outputs so it can handle batches).

Solution

By looking at the source code of BertForSequenceClassification here, you can see that the classifier is simply a linear layer that project the bert output from hidden_size dimension to num_labels dimension. Suppose you want to change the linear classifier to a two layer MLP with Relu activation, you can do the following:

new_classifier = nn.Sequential(
      nn.Linear(config.hidden_size, config.hidden_size *2),
      nn.ReLU(),
      nn.Linear(config.hidden_size * 2, config.num_labels)
    )
model.classifier = new_classifier

The requirement of the structure of your new classifier is its input dimension and output dimension need to be config.hidden_size dimension and config.num_labels accordingly. The structure of the classifier doesn't rely on the batch size, and module like nn.Linear takes (*, H_dimension) dimension as input so you don't need to specify the batch size when creating the new classifier.