Given the following string:
str = "\\u20ac €"
How to decode it into € €
?
Using str.encode("utf-8").decode("unicode-escape")
returns € â\x82¬
(To clarify, I am looking for a general solution how to decode any mix of unicode and escaped characters)
A simple and fast solution is to use re.sub
to match \u
and exactly four hexadecimal digits, and convert those digits into a Unicode code point:
import re
s = r"blah bl\uah \u20ac € b\u20aclah\u12blah blah"
print(s)
s = re.sub(r'\\u([0-9a-fA-F]{4})',lambda m: chr(int(m.group(1),16)),s)
print(s)
Output:
blah bl\uah \u20ac € b\u20aclah\u12blah blah
blah bl\uah € € b€lah\u12blah blah