I'm trying to develop a binary classifier with Huggingface's BertModel and Pytorch. The classifier module is something like this:
class SSTClassifierModel(nn.Module):
def __init__(self, num_classes = 2, hidden_size = 768):
super(SSTClassifierModel, self).__init__()
self.number_of_classes = num_classes
self.dropout = nn.Dropout(0.01)
self.hidden_size = hidden_size
self.bert = BertModel.from_pretrained('bert-base-uncased')
self.classifier = nn.Linear(hidden_size, num_classes)
def forward(self, input_ids, att_masks,token_type_ids, labels):
_, embedding = self.bert(input_ids, token_type_ids, att_masks)
output = self.classifier(self.dropout(embedding))
return output
The way I train the model is as follows:
loss_function = BCELoss()
model.train()
for epoch in range(NO_OF_EPOCHS):
for step, batch in enumerate(train_dataloader):
input_ids = batch[0].to(device)
input_mask = batch[1].to(device)
token_type_ids = batch[2].to(device)
labels = batch[3].to(device)
# assuming batch size = 3, labels is something like:
# tensor([[0],[1],[1]])
model.zero_grad()
model_output = model(input_ids,
input_mask,
token_type_ids,
labels)
# model output is something like: (with batch size = 3)
# tensor([[ 0.3566, -0.0333],
#[ 0.1154, 0.2842],
#[-0.0016, 0.3767]], grad_fn=<AddmmBackward>)
loss = loss_function(model_output.view(-1,2) , labels.view(-1))
I'm doing the .view()
s because of the Huggingface's source code for BertForSequenceClassification
here which uses the exact same way to compute the loss. But I get this error:
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in binary_cross_entropy(input, target, weight, size_average, reduce, reduction)
2068 if input.numel() != target.numel():
2069 raise ValueError("Target and input must have the same number of elements. target nelement ({}) "
-> 2070 "!= input nelement ({})".format(target.numel(), input.numel()))
2071
2072 if weight is not None:
ValueError: Target and input must have the same number of elements. target nelement (3) != input nelement (6)
Is there something wrong with my labels? or my model's output? I'm really stuck here. The documentation for Pytorch's BCELoss says:
Input: (N,∗) where ∗ means, any number of additional dimensions
Target: (N,∗), same shape as the input
How should I make my labels the same shape as the model output? I feel like there's something huge that I'm missing but I can't find it.
Few observations:
CrossEntropyLoss
but you are using BCELoss
. CrossEntropyLoss
takes prediction logits (size: (N,D)) and target labels (size: (N,)) whereas BCELoss
takes p(y=1|x) (size: (N,)) and target labels (size: (N,)) as p(y=0|x) can be computed from p(y=1|x)CrossEntropyLoss
expects logits i.e whereas BCELoss
expects probability valueSolution:
Since you pass an (N,2) tensor, it gives an error. You only need to pass p(y=1|x), therefore you can do
loss = loss_function(model_output.view(-1,2)[:,1] , labels.view(-1))
above I assumed that the second value is p(y=1|x).
A cleaner way would be to make model output only one value i.e p(y=1|x) and pass it to the loss function. It seems from the code that you are passing logit values and not probability values, so you may also need to compute sigmoid (model_output)
if you want to use BCELoss
or alternatively you can use BCEWithLogitsLoss
.
Another alternative is to change the loss to CrossEntropyLoss
that should work too as it can work for binary labels too.