Search code examples
pythonpython-3.xunicodeescapingunicode-escapes

python3: Unescape unicode escapes surrounded by unescaped characters


I received json data that has some unicode characters escaped, and others not.

>>> example = r'сло\u0301во'

What is the best way to unescape those characters? In the example below, what would the function unescape look like? Is there a built-in function that does this?

>>> unescape(example)
сло́во

Solution

  • This solution assumes that every instance of \u in the original string is a unicode escape:

    def unescape(in_str):
        """Unicode-unescape string with only some characters escaped."""
        in_str = in_str.encode('unicode-escape')   # bytes with all chars escaped (the original escapes have the backslash escaped)
        in_str = in_str.replace(b'\\\\u', b'\\u')  # unescape the \
        in_str = in_str.decode('unicode-escape')   # unescape unicode
        return in_str
    

    ...or in one line...

    def unescape(in_str):
        """Unicode-unescape string with only some characters escaped."""
        return in_str.encode('unicode-escape').replace(b'\\\\u', b'\\u').decode('unicode-escape')