TypeError: dropout(): argument 'input' (position 1) must be Tensor, not tuple

I am studying NLP and trying to make a model for classifying sentences. I am creating my class with a model but I get an error saying that the input should be of type Tensor, not tuple. I use 4.21.2 transformers version.

class BertClassificationModel(nn.Module):
    def __init__(self, bert_model_name, num_labels, dropout=0.1):
        super(BertClassificationModel, self).__init__()
        self.bert = BertForSequenceClassification.from_pretrained(bert_model_name, return_dict=False)
        self.dropout = nn.Dropout(dropout)
        self.classifier = nn.Linear(768, num_labels)
        self.num_labels = num_labels
    def forward(self, input_ids, attention_mask=None, token_type_ids=None):
        pooled_output = self.bert(input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)
        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)
        return logits

TypeError: dropout(): argument 'input' (position 1) must be Tensor, not tuple

Solution

The issue you face, is that the output of self.bert is not a tensor but a tuple:

from transformers import BertForSequenceClassification, BertTokenizer

bert_model_name = "bert-base-cased"
t = BertTokenizer.from_pretrained(bert_model_name)
m = BertForSequenceClassification.from_pretrained(bert_model_name, return_dict=False)

o=m(**t("test test", return_tensors="pt"))

print(type(o))

Output:

tuple

I personally do not recommend using return_dict=False as the code becomes more difficult to read. But changing this parameter doesn't help in your case, as you want to use the pooler output which is removed by the classification head of BertForSequenceClassification (the output of BertForSequenceClassification is listed here).

You already wrote in your own answer, that you don't intend to use the classification head of BertForSequenceClassification and you can therefore load BertModel directly (instead of initializing BertForSequenceClassification and only using BERT as you did with: BertForSequenceClassification.from_pretrained(bert_model_name, return_dict=True).bert):

from torch import nn
from transformers import BertModel, BertTokenizer

class BertClassificationModel(nn.Module):
    def __init__(self, bert_model_name, num_labels, dropout=0.1):
        super(BertClassificationModel, self).__init__()
        self.bert = BertModel.from_pretrained(bert_model_name)
        self.dropout = nn.Dropout(dropout)
        self.classifier = nn.Linear(768, num_labels)
        self.num_labels = num_labels
    def forward(self, input_ids, attention_mask=None, token_type_ids=None):
        pooled_output = self.bert(input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids).pooler_output
        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)
        return logits


m = BertClassificationModel("bert-base-cased",4, 0.1)
o = m(**t("test test", return_tensors="pt"))
print(o.shape)

Output:

torch.Size([1, 4])