Search code examples
pythonpython-3.xdataframeemoji

Python pandas: Remove emojis from DataFrame


I have a dataframe which contains a lot of different emojis and I want to remove them. I looked at answers to similar questions but they didn't work for me.

index| messages
----------------
1    |Hello! 👋 
2    |Good Morning 😃  
3    |How are you ?
4    | Good 👍
5    | Ländern

Now I want to remove all these emojis from the DataFrame so it looks like this

    index| messages
    ----------------
    1    |Hello!
    2    |Good Morning   
    3    |How are you ?
    4    | Good 
    5    |Ländern

I tried the solution here but unfortunately it also removes all non-English letters like "ä" How can I remove emojis from a dataframe?


Solution

  • This solution that will keep all ASCII and latin-1 characters, i.e. characters between U+0000 and U+00FF in this list. For extended Latin plus Greek, use < 1024:

    df = pd.DataFrame({'messages': ['Länder 🇩🇪❤️', 'Hello! 👋']})
    
    filter_char = lambda c: ord(c) < 256
    df['messages'] = df['messages'].apply(lambda s: ''.join(filter(filter_char, s)))
    

    Result:

      messages
    0  Länder 
    1  Hello!
    

    Note this does not work for Japanese text for example. Another problem is that the heart "emoji" is actually a Dingbat so I can't simply filter for the Basic Multilingual Plane of Unicode, oh well.