Search code examples
pythonpytorchbert-language-modelhuggingface-transformers

BERT Convert 'SpanAnnotation' to answers using scores from hugging face models


I'm following along with the documentation for importing a pretrained model question and answer model from huggingface

from transformers import BertTokenizer, BertForQuestionAnswering
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForQuestionAnswering.from_pretrained('bert-base-uncased')
question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
inputs = tokenizer(question, text, return_tensors='pt')
start_positions = torch.tensor([1])
end_positions = torch.tensor([3])
outputs = model(**inputs, start_positions=start_positions, end_positions=end_positions)
loss = outputs.loss
start_scores = outputs.start_logits
end_scores = outputs.end_logits

this returns start and end scores, but how can I get a meaningful text answer from here?


Solution

  • So I did a little digging around and it looks like scores can be converted to tokens which can be used to build the answer. Here is a short example:

    answer_start = torch.argmax(start_scores) 
    answer_end = torch.argmax(end_scores) + 1
    
    tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(inputs["input_ids"][0][answer_start:answer_end]))