Search code examples
pythonutf-8emoticons

Print Unicode string containing both accented characters and emoticons


I'm reading a file with Python that contains exactly the following line

à è ì ò ù ç @ \U0001F914

where \U0001F914 is the unicode code for an emoticon.

if interpret the string as

string=string.decode('utf-8')

I get:

à è ì ò ù ç @ \U0001F914

while if I interpret as following:

string=string.decode('unicode-escape')

I get:

à è ì ò ù ç @ 🤔

How can I print instead:

à è ì ò ù ç @ 🤔

I'm a beginner, so pardon me if my question is stupid, but I can't get it out.

Thanks in advance.


Solution

  • Maybe it is not the best solution but first you can use encode with 'unicode-escape' instead of decode and you get

    data = 'à è ì ò ù ç @ \U0001F914'
    print data.encode('unicode-escape')
    
    \xe0 \xe8 \xec \xf2 \xf9 \xe7 @ \\U0001F914
    

    then you have to replace \\ with \ - in Python you will need \\\\ and \\

    data = 'à è ì ò ù ç @ \U0001F914'
    print data.encode('unicode-escape').replace('\\\\', '\\')
    
    \xe0 \xe8 \xec \xf2 \xf9 \xe7 @ \U0001F914
    

    and then you can use your decode with 'unicode-escape'

    data = 'à è ì ò ù ç @ \U0001F914'
    print data.encode('unicode-escape').replace('\\\\', '\\').decode('unicode-escape')
    
    à è ì ò ù ç @ 🤔
    

    EDIT:

    It seems you have to add .decode('utf-8') at the beginning

    #-*- coding: utf-8 -*-
    
    data = 'à è ì ò ù ç @ \U0001F914'.decode('utf-8')
    
    result = data.encode('unicode-escape').replace('\\\\', '\\').decode('unicode-escape')
    
    print result  #.encode('utf-8')