python deep-learning pytorch bert-language-model huggingface-transformers

RuntimeError: The size of tensor a (4000) must match the size of tensor b (512) at non-singleton dimension 1

I'm trying to build a model for document classification. I'm using BERT with PyTorch.

I got the bert model with below code.

bert = AutoModel.from_pretrained('bert-base-uncased')

This is the code for training.

for epoch in range(epochs):
 
    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    #train model
    train_loss, _ = modhelper.train(proc.train_dataloader)

    #evaluate model
    valid_loss, _ = modhelper.evaluate()

    #save the best model
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(modhelper.model.state_dict(), 'saved_weights.pt')

    # append training and validation loss
    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')

this is my train method, accessible with the object modhelper.

def train(self, train_dataloader):
    self.model.train()
    total_loss, total_accuracy = 0, 0
    
    # empty list to save model predictions
    total_preds=[]
    
        # iterate over batches
    for step, batch in enumerate(train_dataloader):
        
        # progress update after every 50 batches.
        if step % 50 == 0 and not step == 0:
            print('  Batch {:>5,}  of  {:>5,}.'.format(step, len(train_dataloader)))
        
        # push the batch to gpu
        #batch = [r.to(device) for r in batch]
        
        sent_id, mask, labels = batch
        
        # clear previously calculated gradients 
        self.model.zero_grad()        

        print(sent_id.size(), mask.size())
        # get model predictions for the current batch
        preds = self.model(sent_id, mask) #This line throws the error
        
        # compute the loss between actual and predicted values
        self.loss = self.cross_entropy(preds, labels)
        
        # add on to the total loss
        total_loss = total_loss + self.loss.item()
        
        # backward pass to calculate the gradients
        self.loss.backward()
        
        # clip the the gradients to 1.0. It helps in preventing the exploding gradient problem
        torch.nn.utils.clip_grad_norm_(self.model.parameters(), 1.0)
        
        # update parameters
        self.optimizer.step()
        
        # model predictions are stored on GPU. So, push it to CPU
        #preds=preds.detach().cpu().numpy()
        
        # append the model predictions
        total_preds.append(preds)
      
    # compute the training loss of the epoch
    avg_loss = total_loss / len(train_dataloader)
    
    # predictions are in the form of (no. of batches, size of batch, no. of classes).
    # reshape the predictions in form of (number of samples, no. of classes)
    total_preds  = np.concatenate(total_preds, axis=0)
      
    #returns the loss and predictions
    return avg_loss, total_preds

preds = self.model(sent_id, mask) this line throws the following error(including full traceback).

 Epoch 1 / 1
torch.Size([32, 4000]) torch.Size([32, 4000])
Traceback (most recent call last):

File "<ipython-input-39-17211d5a107c>", line 8, in <module>
train_loss, _ = modhelper.train(proc.train_dataloader)

File "E:\BertTorch\model.py", line 71, in train
preds = self.model(sent_id, mask)

File "E:\BertTorch\venv\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)

File "E:\BertTorch\model.py", line 181, in forward
#pass the inputs to the model

File "E:\BertTorch\venv\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)

File "E:\BertTorch\venv\lib\site-packages\transformers\modeling_bert.py", line 837, in forward
embedding_output = self.embeddings(

File "E:\BertTorch\venv\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)

File "E:\BertTorch\venv\lib\site-packages\transformers\modeling_bert.py", line 201, in forward
embeddings = inputs_embeds + position_embeddings + token_type_embeddings

RuntimeError: The size of tensor a (4000) must match the size of tensor b (512) at non-singleton dimension 1

If you observe I've printed the torch size in the code. print(sent_id.size(), mask.size())

The output of that line of code is torch.Size([32, 4000]) torch.Size([32, 4000]).

as we can see that size is the same but it throws the error. Please put your thoughts. Really appreciate it.

please comment if you need further information. I'll be quick to add whatever is required.

Solution

The issue is regarding the BERT's limitation with the word count. I've passed the word count as 4000 where the maximum supported is 512(have to give up 2 more for '[cls]' & '[Sep]' at the beginning and the end of the string, so it is 510 only). Reduce the word count or use some other model for your promlem. something like Longformers as suggested by @cronoik in the comments above.

Thanks.