df
0 NaN
1 NaN
2 🤩🤩
3 NaN
4 ❤
...
26368 NaN
26369 NaN
26370 NaN
26371 🔥👌
26372 NaN
Name: emojis, Length: 26373, dtype: object
From the df above, I would like to calculate the sentiment score of the emojis in each row. If NaN, then return NaN.
#!pip install emosent-py
from emosent import get_emoji_sentiment_rank
def emoji_sentiment(text):
return get_emoji_sentiment_rank(text)["sentiment_score"]
emoji_sentiment("😂")
--> 0.221
Applying to the whole column
df['emoji_sentiment'] = df['emojis'].apply(emoji_sentiment)
The code above returns KeyError: nan
Expected result:
df emoji_sentiment
0 NaN | NaN
1 NaN | NaN
2 🤩🤩 | (a decimal number)
3 NaN | NaN
4 ❤ | (a decimal number)
...
26368 NaN | NaN
26369 NaN | NaN
26370 NaN | NaN
26371 🔥👌 | (a decimal number)
26372 NaN | NaN
From your error, I'm guessing get_emoji_sentiment_rank(text)["sentiment_score"]
fails if text is NaN
, so you can either apply the function and assign the update only to the rows that re non-nan (preferable, but you first need to crate the column emoji_sentiment
with a default NaN
value):
df['emoji_sentiment'] = np.NaN # init the value for all rows
not_na_idx = ~df.emojis.isna()
df.loc[not_na_idx, 'emoji_sentiment'] = df.loc[not_na_idx, 'emojis'].apply(emoji_sentiment)
or you change the return of emoji_sentiment()
:
def emoji_sentiment(text):
return get_emoji_sentiment_rank(text)["sentiment_score"] if not pd.isna(text) else np.NaN
(uglier and less performant, but stll feasible)