I'm having trouble displaying raw JSON data in the terminal, using Python3. I get the json as a response from urllib:
r = urlopen(request)
response = r.read()
The result is a byte string b"..."
, part of which contains non-ASCII characters like b"Chybn\\u00e9 heslo"
, which should give me this "Chybné heslo"
.
But I don't know how to decode it to display "Chybné heslo"
, if I do:
print(b"Chybn\\u00e9 heslo".decode('utf-8'))
I just get "Chybn\u00e9 heslo"
. What am I doing wrong here?
Use unicode-escape
codec:
byte_str = b"Chybn\u00e9 heslo"
print(byte_str.decode('unicode-escape')) # Chybné heslo
The reason of your problem is that in byte-strings \u00e9
is not a unicode code point.
It's just a sequence of bytes:
>>> len(b'\u00e9') # whereas len('\u00e9') == 1
6
>>> [b for b in b'\u00e9']
[92, 117, 48, 48, 101, 57]
These bytes are also UTF-8 bytes, so when you decode them with this encoding you get the corresponding sequence of characters:
>>> b'\u00e9'.decode('utf-8')
'\\u00e9'
>>> [chr(b) for b in b'\u00e9'] # decoding in 'byte-by-byte' mode
['\\', 'u', '0', '0', 'e', '9']
Also note that \\
and \
are equivalent in some strings (for more information check this).
For example:
>>> b'\\u' == b'\u'
True
>>> b'\\u00e9' == b'\u00e9'
True
>>> b'\\n' == b'\n'
False
>>> '\\u00e9' == '\u00e9'
False
>>> '\\z' == '\z'
True