Search code examples
huggingface-transformers

Obtain prediction score


I'm following this notebook for inference with LayoutLM. I would like to modify the code to access the prediction score, in % format for each prediction. The same way I access true_predictions and true_boxes.

So far, I tried with probabilities = torch.nn.functional.softmax(true_predictions, dim=-1) like this:

#Until here, just following the notebook
logits = outputs.logits

predictions = logits.argmax(-1).squeeze().tolist()
token_boxes = encoding.bbox.squeeze().tolist()
probabilities = torch.nn.functional.softmax(logits, dim=-1).squeeze().tolist()

if (len(token_boxes) == 512):
    predictions = [predictions]
    token_boxes = [token_boxes]
    probabilities = [probabilities]

predictions = list(itertools.chain(*predictions))
token_boxes = list(itertools.chain(*token_boxes))
probabilities = list(itertools.chain(*probabilities))
      
is_subword = np.array(offset_mapping.squeeze().tolist())[:,0] != 0
true_predictions = [self.id2label[pred] for idx, pred in enumerate(predictions) if not is_subword[idx]]
true_boxes = [box for idx, box in enumerate(token_boxes) if not is_subword[idx]]
true_probabilities = [probability for idx, probability in enumerate(probabilities) if not is_subword[idx]]
  
for prediction, box, probability in zip(true_predictions, true_boxes, true_probabilities):
    print(probability )

Output: [0.00010619303793646395, 3.339954128023237e-05, 2.2820451704319566e-05, 2.2919863113202155e-05, 0.0005767009570263326, 5.0725124310702085e-05, 3.0033241273486055e-05, 0.006056534126400948, 4.6057226427365094e-05, 1.2512471585068852e-05, 0.0002005402639042586, 2.0308254534029402e-05, 0.992790937423706, 3.023005228897091e-05]

Which means that there are 14 labels, with the most likely one being label #13 with 0.992 (99,2%). But that is not quite what I was aming for. I'm not looking for the probability of each label for that prediction. I'm looking for the prediction score itself. Something like this prediction has a confidence of 75%


Solution

  • Probabilities are in range [0, 1], So, if you need percentages, Scale the output of the SoftMax activation by 100.

    softmax_output = [
    0.00010619303793646395, 3.339954128023237e-05,
    2.2820451704319566e-05, 2.2919863113202155e-05, 0.0005767009570263326,
    5.0725124310702085e-05, 3.0033241273486055e-05, 0.006056534126400948,
    4.6057226427365094e-05, 1.2512471585068852e-05, 0.0002005402639042586,
    2.0308254534029402e-05, 0.992790937423706, 3.023005228897091e-05
    ]
    
    probabilities_percentage = [round(prob * 100, 2) for prob in softmax_output]
    
    print(probabilities_percentage)
    
    # [0.01, 0.0, 0.0, 0.0, 0.06, 0.01, 0.0, 0.61, 0.0, 0.0, 0.02, 0.0, 99.28, 0.0]
    

    EDIT 1:

    Inc case you need a confdidence score for the predictions, you should look at the accuracy on unseen data (test set). If you get an accuracy of 90% in test, you could (roughly) assume that your model will hit the target 90% of the times.