I have problem extracting emoji from a series. The code used:
import emoji
def extract_emojis(text):
return ''.join(c for c in text if c in emoji.UNICODE_EMOJI)
for text in df['comments']:
df['emoji']=extract_emojis(text)
Output:
comments | emoji
0 Its very beautiful
1 Your new bike, @keir ...?
2 @philip 🤩🤩
3 Any news on the Canadian expansion mentioned i...
4 Rocky Mountain ❤️
... ... ...
Checking the function on just a text:
text = '@philip 🤩🤩'
extract_emojis(text)
--> '\U0001f929\U0001f929'
Expected result:
comments | emoji
0 Its very beautiful |
1 Your new bike, @keir ...? |
2 @philip 🤩🤩 | 🤩🤩
3 Any news on the Canadian expansion mentioned i... |
4 Rocky Mountain ❤️ | ❤️
... ... ...
Note:
I have only asked this question after looking at these links:
Python unicode character conversion for Emoji
How to extract all the emojis from text?
Rather than iterating over the entire dataset. You can apply the function using apply
or lambda
.
import pandas as pd
import emoji
df = pd.DataFrame([['@philip 🤩🤩 '],
['Rocky Mountain ❤️']],columns = ['comments'])
Using Lambda:
df['emojis'] = df['comments'].apply(lambda row: ''.join(c for c in row if c in emoji.UNICODE_EMOJI))
df
using Apply
def extract_emojis(text):
return ''.join(c for c in text if c in emoji.UNICODE_EMOJI)
df['emoji_apply'] = df['comments'].apply(extract_emojis)
df
Output:
comments emojis
@philip 🤩🤩 🤩🤩
Rocky Mountain ❤️ ❤