Is there a way to convert a \x
escaped string like "\\xe8\\xaa\\x9e\\xe8\\xa8\\x80"
into readable form: "語言"
?
>>> a = "\\xe8\\xaa\\x9e\\xe8\\xa8\\x80"
>>> print(a)
\xe8\xaa\x9e\xe8\xa8\x80
I am aware that there is a similar question here, but it seems the solution is only for latin characters. How can I convert this form of string into readable CJK characters?
Decode it first using 'unicode-escape', then as 'utf8':
a = "\\xe8\\xaa\\x9e\\xe8\\xa8\\x80"
decoded = a.encode('latin1').decode('unicode_escape').encode('latin1').decode('utf8')
print(decoded)
# 語言
Note that since we can only decode bytes objects, we need to transparently encode it in between, using 'latin1'.