The following code is taken from here:
from transformers import BertTokenizer, BertForMultipleChoice
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForMultipleChoice.from_pretrained('bert-base-uncased')
prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
choice0 = "It is eaten with a fork and a knife."
choice1 = "It is eaten while held in the hand."
labels = torch.tensor(0).unsqueeze(0) # choice0 is correct (according to Wikipedia ;)), batch size 1
encoding = tokenizer([prompt, prompt], [choice0, choice1], return_tensors='pt', padding=True)
outputs = model(**{k: v.unsqueeze(0) for k,v in encoding.items()}, labels=labels) # batch size is 1
# the linear classifier still needs to be trained
loss = outputs.loss
logits = outputs.logits
I guess logits
in final line contains how well the model thought each choice likely to be correct, but don't know whether one with max or min value is presumed to be correct.
Seeing the definition of the word "logit", I guess item with the highest one is what the model guesses is likely to be correct the most.