Search code examples
pythontwitterencodeemoji

Translate to text a Emoji Unicode String un python


I have a list of tweets that has been delivered as a csv. But when I read them, the emojis unicode has been converted as str and I can't translate them to their real name ("waffle" or "heart").

def load_csv(csv_name):
    path = os.getcwd()
    df = pd.read_csv(path + "/" + csv_name, header=0, index_col=0, parse_dates=True, sep=",", encoding="utf-8")
    return df

csv_name = "tweets_nikekaepernick.csv"
df = load_csv(csv_name)

text = df["tweet_full_text"].iloc[0]
text

Out[]: 'Hi <U+0001F602><U+0001F602><U+0001F480><U+0001F480><U+0001F480><U+0001F480>'

Solution

  • Try it with demoji . You can get more details about demoji at here.

    code

    import re
    import demoji
    demoji.download_codes()
    
    text = 'Hi <U+0001F602><U+0001F602><U+0001F480><U+0001F480><U+0001F480><U+0001F480>'
    
    # changed format with regex
    text_ = re.sub('\+|>','',text).replace('<','\\').encode().decode('unicode-escape')
    
    #find emoji
    demoji.findall(text_)
    

    result

    demoji.findall(text_)
    Out[1]: {'💀': 'skull', '😂': 'face with tears of joy'}
    

    More

    For more, if you wants to remove emojis, you can try the below code, which is referring form here:

    pattern = re.compile("["
            u"\U0001F600-\U0001F64F"  # emoticons
            u"\U0001F300-\U0001F5FF"  # symbols & pictographs
            u"\U0001F680-\U0001F6FF"  # transport & map symbols
            u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                               "]+", flags=re.UNICODE)
    
    print(pattern.sub(r'', text_))
    >>> Hi
    

    Or, if you wants to translate your emoji to str, you can try:

    import emoji
    print(emoji.demojize(text_))
    
    >>> Hi :face_with_tears_of_joy::face_with_tears_of_joy::skull::skull::skull::skull: