Search code examples
pythonpython-3.xescaping

How do I .decode('string-escape') in Python 3?


I have some escaped strings that need to be unescaped. I'd like to do this in Python.

For example, in Python 2.7 I can do this:

>>> "\\123omething special".decode('string-escape')
'Something special'
>>> 

How do I do it in Python 3? This doesn't work:

>>> b"\\123omething special".decode('string-escape')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: unknown encoding: string-escape
>>> 

My goal is to be able to take a string like this:

s\000u\000p\000p\000o\000r\000t\000@\000p\000s\000i\000l\000o\000c\000.\000c\000o\000m\000

And turn it into:

"support@psiloc.com"

After I do the conversion, I'll probe to see if the string I have is encoded in UTF-8 or UTF-16.


Solution

  • If you want str-to-str decoding of escape sequences, so both input and output are Unicode:

    def string_escape(s, encoding='utf-8'):
        return (s.encode('latin1')         # To bytes, required by 'unicode-escape'
                 .decode('unicode-escape') # Perform the actual octal-escaping decode
                 .encode('latin1')         # 1:1 mapping back to bytes
                 .decode(encoding))        # Decode original encoding
    

    Testing:

    >>> string_escape('\\123omething special')
    'Something special'
    
    >>> string_escape(r's\000u\000p\000p\000o\000r\000t\000@'
                      r'\000p\000s\000i\000l\000o\000c\000.\000c\000o\000m\000',
                      'utf-16-le')
    'support@psiloc.com'