Search code examples
pythonpython-3.xregexemoji

Extract emoji from series of text


I have problem extracting emoji from a series. The code used:

import emoji
def extract_emojis(text):
  return ''.join(c for c in text if c in emoji.UNICODE_EMOJI)

for text in df['comments']:
    df['emoji']=extract_emojis(text)

Output:

             comments                                    | emoji
0     Its very beautiful    
1   Your new bike, @keir ...?   
2   @philip 🤩🤩    
3   Any news on the Canadian expansion mentioned i...   
4   Rocky Mountain ❤️   
... ... ...

Checking the function on just a text:

text = '@philip 🤩🤩'
extract_emojis(text)
--> '\U0001f929\U0001f929'        

Expected result:

             comments                                    | emoji
0     Its very beautiful                                 |
1   Your new bike, @keir ...?                            |
2   @philip 🤩🤩                                         | 🤩🤩
3   Any news on the Canadian expansion mentioned i...    |
4   Rocky Mountain ❤️                                    | ❤️ 
... ... ...

Note: I have only asked this question after looking at these links:
Python unicode character conversion for Emoji
How to extract all the emojis from text?


Solution

  • Rather than iterating over the entire dataset. You can apply the function using apply or lambda.

    import pandas as pd 
    import emoji
    df = pd.DataFrame([['@philip 🤩🤩 '],
    ['Rocky Mountain ❤️']],columns = ['comments'])
    

    Using Lambda:

    df['emojis'] = df['comments'].apply(lambda row: ''.join(c for c in row if c in emoji.UNICODE_EMOJI))
    df
    

    using Apply

    def extract_emojis(text):
        return ''.join(c for c in text if c in emoji.UNICODE_EMOJI)
    
    df['emoji_apply'] = df['comments'].apply(extract_emojis)
    df
    

    Output:

    comments    emojis
    @philip 🤩🤩    🤩🤩
    Rocky Mountain ❤️   ❤