Search code examples
pytorchhuggingface-transformers

The size of Logits of Roberta model is weird


My input size is [8,22]. A batch with 8 tokenized sentences with a length of 22. I dont want to use the default classifier.

model = RobertaForSequenceClassification.from_pretrained("xlm-roberta-large")
 
model.classifier=nn.Identity()

After model(batch) The size of result is torch.Size([8, 22, 1024]). I have no idea why. Should it be [8,1024]?


Solution

  • The model.classifier object you have replaced used to be an instance of a RobertaClassificationHead. If you take a look at its source code[1], the layer is hard-coded into indexing the first item of the second dimension of its input, which is supposed to be the [CLS] token. By replacing it with an Identity you miss out on the indexing operation, hence your output shape.

    Long story short, don't assume functionality you haven't verified when it comes to non-own code, huggingface in particular (lots of ad-hoc classes and spaghetti interfaces, least as far as I'm concerned).


    [1] source