Search code examples
pythonunicodecharacter-encodingemojipython-unicode

Python 3 - How are the emojis and unicode handled and read in Python? A test


I have some sentences with words and emojis and my goal is to convert the emojis in their description.

Example: "😊 Hello!" will converted in "smiling_face_with_smiling_eyes Hello!"

Actually I am not at ease with encoding/decoding and I have encountered some issues. Thanks to another post here Converting emojis to unicode and viceversa I think I may have found the solution. Still, I don't understand what it is going on and the reasons why I should do this. I will appreciate some explanations.

I will show you two tests, the first one is the one that failed. May you explain why?

# -*- coding: UTF-8 -*
unicode = u"\U0001f600"
string = u"\U0001f600 Hello world"
print("SENT: "+string)

OUTPUT: SENT: 😀 Hello world

Test 1 (FAIL):

if string.find(unicode):
   print("after: "+string.replace(unicode,"grinning_face_with_sweat"))
else:
   print("not found : "+unicode)

OUTPUT: not found : 😀

Test 2:

if string.find(unicode.encode('unicode-escape').decode('ASCII')):
   print(string.replace(unicode,"grinning_face_with_sweat"))
else:
   print("not found : "+unicode)

OUTPUT: grinning_face_with_sweat Hello world


Solution

  • Since the text from unicode is at the beginning of string, string.find(unicode) returns 0. If not found, it returns -1. Your code should be:

    if string.find(unicode) != -1:
       print("after: "+string.replace(unicode,"grinning_face_with_sweat"))
    else:
       print("not found : "+unicode)
    

    BTW, are you still using Python 2? I strongly suggest switching to Python 3. And if you're using Python 3, there's no need to precede strings with u, since all strings in Python 3 are Unicode.