Search code examples
pythonpandasemojisentiment-analysis

Get sentiment score of emoji #Python


df
0        NaN
1        NaN
2         🤩🤩
3        NaN
4          ❤
        ... 
26368    NaN
26369    NaN
26370    NaN
26371     🔥👌
26372    NaN
Name: emojis, Length: 26373, dtype: object

From the df above, I would like to calculate the sentiment score of the emojis in each row. If NaN, then return NaN.

#!pip install emosent-py
from emosent import get_emoji_sentiment_rank
def emoji_sentiment(text):
    return get_emoji_sentiment_rank(text)["sentiment_score"]

emoji_sentiment("😂")
--> 0.221

Applying to the whole column

df['emoji_sentiment'] = df['emojis'].apply(emoji_sentiment)

The code above returns KeyError: nan

Expected result:

          df             emoji_sentiment
0        NaN         |         NaN
1        NaN         |         NaN
2         🤩🤩      |  (a decimal number)
3        NaN         |         NaN
4          ❤        |   (a decimal number)
        ... 
26368    NaN         |         NaN
26369    NaN         |         NaN
26370    NaN         |         NaN
26371     🔥👌       |   (a decimal number)
26372    NaN         |         NaN

Solution

  • From your error, I'm guessing get_emoji_sentiment_rank(text)["sentiment_score"] fails if text is NaN, so you can either apply the function and assign the update only to the rows that re non-nan (preferable, but you first need to crate the column emoji_sentiment with a default NaN value):

    df['emoji_sentiment'] = np.NaN # init the value for all rows
    not_na_idx = ~df.emojis.isna()
    df.loc[not_na_idx, 'emoji_sentiment'] = df.loc[not_na_idx, 'emojis'].apply(emoji_sentiment)
    

    or you change the return of emoji_sentiment():

    def emoji_sentiment(text):
        return get_emoji_sentiment_rank(text)["sentiment_score"] if not pd.isna(text) else np.NaN
    

    (uglier and less performant, but stll feasible)