So I'm trying to train my BigBird model (BigBirdForSequenceClassification) and I got to the moment of the training, which ends with below error message:
Traceback (most recent call last):
File "C:\Users\######", line 189, in <module>
train_loss, _ = train()
File "C:\Users\######", line 152, in train
loss = cross_entropy(preds, labels)
File "C:\Users\#####\venv\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\######\venv\lib\site-packages\torch\nn\modules\loss.py", line 211, in forward
return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)
File "C:\Users\######\venv\lib\site-packages\torch\nn\functional.py", line 2532, in nll_loss
return torch._C._nn.nll_loss_nd(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
TypeError: nll_loss_nd(): argument 'input' (position 1) must be Tensor, not tuple
From what I understand, the problem happens because the train() function returns the tuple. Now - my question is how I should approach such issue? How do I change the output of train() function to return tensor instead of tuple? I have seen similar issues posted here but none of the solutions seems to be helpful in my case, not even
model = BigBirdForSequenceClassification(config).from_pretrained(checkpoint, return_dict=False)
(When I don't add return_dict=False I got similiar error message but it says "TypeError: nll_loss_nd(): argument 'input' (position 1) must be Tensor, not SequenceClassifierOutput
"
Please see my train code below:
def train():
model.train()
total_loss = 0
total_preds = []
for step, batch in enumerate(train_dataloader):
if step % 10 == 0 and not step == 0:
print('Batch {:>5,} of {:>5,}.'.format(step, len(train_dataloader)))
batch = [r.to(device) for r in batch]
sent_id, mask, labels = batch
preds = model(sent_id, mask)
loss = cross_entropy(preds, labels)
total_loss = total_loss + loss.item()
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
optimizer.step()
optimizer.zero_grad()
preds = preds.detach().cpu().numpy()
total_preds.append(preds)
avg_loss = total_loss / len(train_dataloader)
total_preds = np.concatenate(total_preds, axis=0)
return avg_loss, total_preds
and then:
for epoch in range(epochs):
print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))
train_loss, _ = train()
train_losses.append(train_loss)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
I will really appreciate any help on this case and thank you in advance!
Ok, so it seems like I should have used BigBirdModel instead of BigBirdForSequenceClassification - issue solved