Search code examples
pythonpandassentiment-analysishuggingface-transformers

How to take just the score from HuggingFace Pipeline Sentiment Analysis


I'm quite new to the whole HuggingFace pipeline world, and I have stumbled upon something which I can't figure out. I have googled quite a bit for an answer, but haven't found anything yet, so any help would be great. I am trying to get just the score from the HF pipeline sentiment classifier, not the label, as I want to apply the scores to a dataframe containing many cells of text. I know how to achieve this on just a single sentence, namely like so:

from transformers import pipeline
classifier = pipeline("sentiment-analysis")

result = classifier("This is a positive sentence")[0]
(result['score'])

This gives me the following output:

0.9994597434997559

I know how to apply the classifier to my dataframe. However, when I adapt the code above to the dataframe, like so:

result = df['text'].apply(lambda x: classifier(x[:512]))[0]
df['sentiment'] = result['score']

My code fails on the second line, with the following error:

TypeError: list indices must be integers or slices, not str

Does anyone know how to fix this? I have tried a few things, but I haven't been able to figure it out so far. Any help would be immensely appreciated!


Solution

  • If your classifier output looks like this:

    [{'label': '1', 'score': 0.9999555349349976}]
    

    then you could extract the score with the following:

    result['sentiment'] = df['text'].apply(lambda x: classifier(x[:512]).apply(
      lambda x: classifier(x)).str[0].str['score']
    

    Alternatively:

    Get the classifier output:

    df['result'] = df['text'].apply(lambda x: classifier(x[:512]))
    

    Extract the score from the output:

    df['sentiment'] = df['result'].str[0].str['score']