python twitter sentiment-analysis huggingface-transformers bert-language-model

Finding the scores for each tweet with a BERT-based sentiment analysis model

I am doing a sentiment analysis of twitter posts and I have a question regarding “German Sentiment Classification with Bert”:

I would like to display the sentiment score (positive, negative, neutral) for each tweet like it is shown on the models card on huggingface(screenshot) I tried to go through the implementation of the mode by stepping into every line of code but could not figure out how to find the scores.

My code is based on the following code:


model = SentimentModel()

texts = [
    "Mit keinem guten Ergebniss","Das ist gar nicht mal so gut",
    "Total awesome!","nicht so schlecht wie erwartet",
    "Der Test verlief positiv.","Sie fährt ein grünes Auto."]
       
result = model.predict_sentiment(texts)
print(result)

Solution

you can inherit from the model's class and define a function to output the scores:

from typing import List
import torch
from germansentiment import SentimentModel


class SentimentModel(SentimentModel):
    def __init__(self):
        super().__init__()
        
    def predict_sentiment_proba(self, texts: List[str])-> List[str]:
        texts = [self.clean_text(text) for text in texts]
        # Add special tokens takes care of adding [CLS], [SEP], <s>... tokens in the right way for each model.
        # truncation=True limits number of tokens to model's limitations (512)
        encoded = self.tokenizer.batch_encode_plus(texts, padding=True, add_special_tokens=True,truncation=True, return_tensors="pt")
        
        encoded = encoded.to(self.device)
        with torch.no_grad():
                logits = self.model(**encoded)
        
        #label_ids = torch.argmax(logits[0], axis=1)
        return torch.nn.Softmax(dim=1)(logits[0]), self.model.config.id2label
   
texts = ["Mit keinem guten Ergebniss","Das ist gar nicht mal so gut",
    "Total awesome!","nicht so schlecht wie erwartet",
    "Der Test verlief positiv.","Sie fährt ein grünes Auto."]

model = SentimentModel()
scores, ids = model.predict_sentiment_proba(texts)

scores
>tensor([[1.1602e-03, 9.9877e-01, 6.8676e-05],
        [8.8440e-04, 9.9909e-01, 2.3437e-05],
        [9.8738e-01, 1.2542e-02, 7.6997e-05],
        [9.7940e-01, 2.0516e-02, 8.2444e-05],
        [4.1755e-04, 4.6088e-04, 9.9912e-01],
        [2.1236e-05, 5.3932e-05, 9.9992e-01]])
ids
>{0: 'positive', 1: 'negative', 2: 'neutral'}

scores.argmax(dim=-1)
>tensor([1, 1, 0, 0, 2, 2]) #negative, negative, positive, positive, neutral, neutral