I'm reading a file with Python that contains exactly the following line
à è ì ò ù ç @ \U0001F914
where \U0001F914
is the unicode code for an emoticon.
if interpret the string as
string=string.decode('utf-8')
I get:
à è ì ò ù ç @ \U0001F914
while if I interpret as following:
string=string.decode('unicode-escape')
I get:
à è ì ò ù ç @ 🤔
How can I print instead:
à è ì ò ù ç @ 🤔
I'm a beginner, so pardon me if my question is stupid, but I can't get it out.
Thanks in advance.
Maybe it is not the best solution but first you can use encode
with 'unicode-escape'
instead of decode
and you get
data = 'à è ì ò ù ç @ \U0001F914'
print data.encode('unicode-escape')
\xe0 \xe8 \xec \xf2 \xf9 \xe7 @ \\U0001F914
then you have to replace \\
with \
- in Python you will need \\\\
and \\
data = 'à è ì ò ù ç @ \U0001F914'
print data.encode('unicode-escape').replace('\\\\', '\\')
\xe0 \xe8 \xec \xf2 \xf9 \xe7 @ \U0001F914
and then you can use your decode
with 'unicode-escape'
data = 'à è ì ò ù ç @ \U0001F914'
print data.encode('unicode-escape').replace('\\\\', '\\').decode('unicode-escape')
à è ì ò ù ç @ 🤔
EDIT:
It seems you have to add .decode('utf-8')
at the beginning
#-*- coding: utf-8 -*-
data = 'à è ì ò ù ç @ \U0001F914'.decode('utf-8')
result = data.encode('unicode-escape').replace('\\\\', '\\').decode('unicode-escape')
print result #.encode('utf-8')