Search code examples
pythonpandasnlpflair

Adding '-' sign to negative flair sentiment analysis


I am creating a sentiment analysis code for stock market analysis. This is the heart of the code:

import flair
flair_sentiment = flair.models.TextClassifier.load('en-sentiment')
columns = ['ticker', 'date', 'time', 'headline']
parsed_and_scored_news = pd.DataFrame(parsed_news, columns=columns)
sentiment = []
for head in parsed_and_scored_news['headline']:
    s = flair.data.Sentence(head)
    flair_sentiment.predict(s)
    total_sentiment = s.labels
    sentiment.append(total_sentiment)
    scores_df = pd.DataFrame(sentiment)
    parsed_and_scored_news = parsed_and_scored_news.join(scores_df, rsuffix='_right')
    
# Convert the date column from string to datetime
parsed_and_scored_news['date'] = pd.to_datetime(parsed_and_scored_news.date).dt.dateparsed_and_scored_news.head()

The following output is produced:

    ticker     date      time              headline                                    0
0   AMZN    2021-03-26  02:37PM Tech stocks are going to do vey well going for...   POSITIVE (0.9986)
1   AMZN    2021-03-26  01:17PM Amazon mocked idea its drivers urinated in bot...   NEGATIVE (0.9855)
2   AMZN    2021-03-26  01:11PM ThredUp CEO on IPO day: Dont tax resale and Am...   NEGATIVE (0.6743)
3   AMZN    2021-03-26  12:54PM Why this retailer is seeing a triple-digit sal...   POSITIVE (0.9597)
4   AMZN    2021-03-26  12:07PM How to secure your smart home camera                POSITIVE (0.9981)
        

Since I want to feed the data into an ML model I need the score to be numeric. I know that using probability = sentence.labels[0].score gives us only the scores, but that means there is no way to classify whether a statement is positive is negative. Is there a way to add a '-'(negation) sign behind the scores classified as negative. For e.g - NEGATIVE (0.9855) = -9855. This will ensure that the information is numeric as well as useful.


Solution

  • This piece of code worked for me:

    sentiment = []
    sentiment_score =[]
    for head in parsed_and_scored_news['headline']:
        s = flair.data.Sentence(head)
        flair_sentiment.predict(s)
        total_sentiment = s.labels[0].value
        total_sentiment_score = s.labels[0].score
        sentiment.append(total_sentiment)
        sentiment_score.append(total_sentiment_score)
    scores_df = pd.DataFrame(sentiment)
    scores_df_1 = pd.DataFrame(sentiment_score)
    parsed_and_scored_news = parsed_and_scored_news.join(scores_df, rsuffix='_right')
    parsed_and_scored_news = parsed_and_scored_news.join(scores_df_1, rsuffix='_right')
    
    st = parsed_and_scored_news['0_right'].tolist()
    count = -1
    for item in parsed_and_scored_news['0']:
        count = count+1
        if item == 'NEGATIVE':
            lst[count] = 0-lst[count]
        
    scores_final = pd.DataFrame(lst)
    parsed_and_scored_news = parsed_and_scored_news.join(scores_final, rsuffix='_final')