I have some sentences with words and emojis and my goal is to convert the emojis in their description.
Example: "😊 Hello!" will converted in "smiling_face_with_smiling_eyes Hello!"
Actually I am not at ease with encoding/decoding and I have encountered some issues. Thanks to another post here Converting emojis to unicode and viceversa I think I may have found the solution. Still, I don't understand what it is going on and the reasons why I should do this. I will appreciate some explanations.
I will show you two tests, the first one is the one that failed. May you explain why?
# -*- coding: UTF-8 -*
unicode = u"\U0001f600"
string = u"\U0001f600 Hello world"
print("SENT: "+string)
OUTPUT: SENT: 😀 Hello world
Test 1 (FAIL):
if string.find(unicode):
print("after: "+string.replace(unicode,"grinning_face_with_sweat"))
else:
print("not found : "+unicode)
OUTPUT: not found : 😀
Test 2:
if string.find(unicode.encode('unicode-escape').decode('ASCII')):
print(string.replace(unicode,"grinning_face_with_sweat"))
else:
print("not found : "+unicode)
OUTPUT: grinning_face_with_sweat Hello world
Since the text from unicode
is at the beginning of string
, string.find(unicode)
returns 0. If not found, it returns -1. Your code should be:
if string.find(unicode) != -1:
print("after: "+string.replace(unicode,"grinning_face_with_sweat"))
else:
print("not found : "+unicode)
BTW, are you still using Python 2? I strongly suggest switching to Python 3. And if you're using Python 3, there's no need to precede strings with u
, since all strings in Python 3 are Unicode.