Search code examples
pythontwittersentiment-analysishuggingface-transformersbert-language-model

Finding the scores for each tweet with a BERT-based sentiment analysis model


I am doing a sentiment analysis of twitter posts and I have a question regarding “German Sentiment Classification with Bert”:

I would like to display the sentiment score (positive, negative, neutral) for each tweet like it is shown on the models card on huggingface(screenshot) I tried to go through the implementation of the mode by stepping into every line of code but could not figure out how to find the scores.

enter image description here

My code is based on the following code:


model = SentimentModel()

texts = [
    "Mit keinem guten Ergebniss","Das ist gar nicht mal so gut",
    "Total awesome!","nicht so schlecht wie erwartet",
    "Der Test verlief positiv.","Sie fährt ein grünes Auto."]
       
result = model.predict_sentiment(texts)
print(result)

Solution

  • you can inherit from the model's class and define a function to output the scores:

    from typing import List
    import torch
    from germansentiment import SentimentModel
    
    
    class SentimentModel(SentimentModel):
        def __init__(self):
            super().__init__()
            
        def predict_sentiment_proba(self, texts: List[str])-> List[str]:
            texts = [self.clean_text(text) for text in texts]
            # Add special tokens takes care of adding [CLS], [SEP], <s>... tokens in the right way for each model.
            # truncation=True limits number of tokens to model's limitations (512)
            encoded = self.tokenizer.batch_encode_plus(texts, padding=True, add_special_tokens=True,truncation=True, return_tensors="pt")
            
            encoded = encoded.to(self.device)
            with torch.no_grad():
                    logits = self.model(**encoded)
            
            #label_ids = torch.argmax(logits[0], axis=1)
            return torch.nn.Softmax(dim=1)(logits[0]), self.model.config.id2label
       
    texts = ["Mit keinem guten Ergebniss","Das ist gar nicht mal so gut",
        "Total awesome!","nicht so schlecht wie erwartet",
        "Der Test verlief positiv.","Sie fährt ein grünes Auto."]
    
    model = SentimentModel()
    scores, ids = model.predict_sentiment_proba(texts)
    
    scores
    >tensor([[1.1602e-03, 9.9877e-01, 6.8676e-05],
            [8.8440e-04, 9.9909e-01, 2.3437e-05],
            [9.8738e-01, 1.2542e-02, 7.6997e-05],
            [9.7940e-01, 2.0516e-02, 8.2444e-05],
            [4.1755e-04, 4.6088e-04, 9.9912e-01],
            [2.1236e-05, 5.3932e-05, 9.9992e-01]])
    ids
    >{0: 'positive', 1: 'negative', 2: 'neutral'}
    
    scores.argmax(dim=-1)
    >tensor([1, 1, 0, 0, 2, 2]) #negative, negative, positive, positive, neutral, neutral