Search code examples
pythonamazon-web-servicesnlpamazon-sagemakerhuggingface-transformers

How to return all labels and scores in SageMaker Inference?


I am using the HuggingFacePredictor from sagemaker.huggingface to inference some text and I would like to get all label scores.

Is there any way of getting, as response from the endpoint:

{
    "labels": ["help", "Greeting", "Farewell"] ,
    "score": [0.81, 0.1, 0.09],
}

(or similar)

Instead of:

{
    "label": "help",
    "score": 0.81,
}

Here is some example code:

import boto3

from sagemaker.huggingface import HuggingFacePredictor
from sagemaker.session import Session

sagemaker_session = Session(boto_session=boto3.session.Session())

predictor = HuggingFacePredictor(
    endpoint_name=project, sagemaker_session=sagemaker_session
)
prediciton = predictor.predict({"inputs": text})[0]

Solution

  • With your current code sample, it is not quite clear what specific task you are performing, but for the sake of this answer, I'll assume you're doing text classification.

    Most importantly, though, we can read the following in Huggingface's Sagemaker reference document (bold highlight by me):

    The Inference Toolkit accepts inputs in the inputs key, and supports additional pipelines parameters in the parameters key. You can provide any of the supported kwargs from pipelines as parameters.

    If we check out the accepted arguments by the TextClassificationPipeline, we can see that there is indeed one that returns all samples:

    return_all_scores (bool, optional, defaults to False) — Whether to return scores for all labels.

    While I unfortunately don't have access to Sagemaker inference, I can run a sample to illustrate the output with a local pipeline:

    from transformers import pipeline
    # uses 2-way sentiment classification model per default
    pipe = pipeline("text-classification") 
    
    pipe("I am really angry right now >:(", return_all_scores=True)
    # Output: [[{'label': 'NEGATIVE', 'score': 0.9989138841629028},
    #           {'label': 'POSITIVE', 'score': 0.0010860705515369773}]]
    

    Based on the slightly different input format expected by Sagemaker, coupled with the example given in this notebook, I would assume that a corrected input in your own example code should look like this:

    {
        "inputs": text,
        "parameters": {"return_all_scores": True}
    }