python pytorch huggingface-transformers bert-language-model

TypeError: nll_loss_nd(): argument 'input' (position 1) must be Tensor, not tuple

So I'm trying to train my BigBird model (BigBirdForSequenceClassification) and I got to the moment of the training, which ends with below error message:

Traceback (most recent call last):
  File "C:\Users\######", line 189, in <module>
    train_loss, _ = train()  
  File "C:\Users\######", line 152, in train
    loss = cross_entropy(preds, labels)
  File "C:\Users\#####\venv\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\######\venv\lib\site-packages\torch\nn\modules\loss.py", line 211, in forward
    return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)
  File "C:\Users\######\venv\lib\site-packages\torch\nn\functional.py", line 2532, in nll_loss
    return torch._C._nn.nll_loss_nd(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
TypeError: nll_loss_nd(): argument 'input' (position 1) must be Tensor, not tuple

From what I understand, the problem happens because the train() function returns the tuple. Now - my question is how I should approach such issue? How do I change the output of train() function to return tensor instead of tuple? I have seen similar issues posted here but none of the solutions seems to be helpful in my case, not even

model = BigBirdForSequenceClassification(config).from_pretrained(checkpoint, return_dict=False)

(When I don't add return_dict=False I got similiar error message but it says "TypeError: nll_loss_nd(): argument 'input' (position 1) must be Tensor, not SequenceClassifierOutput" Please see my train code below:

def train():
    model.train()
    total_loss = 0
    total_preds = []
    
    for step, batch in enumerate(train_dataloader):
        
        if step % 10 == 0 and not step == 0:
            print('Batch {:>5,}  of  {:>5,}.'.format(step, len(train_dataloader)))
            
        batch = [r.to(device) for r in batch]
        sent_id, mask, labels = batch

        preds = model(sent_id, mask)

        loss = cross_entropy(preds, labels)
        total_loss = total_loss + loss.item()
        loss.backward()

        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        optimizer.step()
        optimizer.zero_grad()
        preds = preds.detach().cpu().numpy()
        total_preds.append(preds)

    avg_loss = total_loss / len(train_dataloader)
    total_preds = np.concatenate(total_preds, axis=0)
    return avg_loss, total_preds

and then:

for epoch in range(epochs):
    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()  
    train_losses.append(train_loss)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

I will really appreciate any help on this case and thank you in advance!

Solution

Ok, so it seems like I should have used BigBirdModel instead of BigBirdForSequenceClassification - issue solved