Search code examples
pythonescapingascii

Decoding specific escaped characters in a Python string


I have a Python variable (named var) containing a string with the following literal data:

day\r\n\\night

in hex, it is:

64  61  79  5C  72  5C  6E  5C  5C  6E  69  67  68  74  07
d   a   y   \   r   \   n   \   \   n   i   g   h   t   BEL

I need to decode \\, \r and \n only.

The desired output (in hex):

64  61  79  0D  0A  5C  6E  69  67  68  74  07
d   a   y   CR  LF  \   n   i   g   h   t   BEL

Using decode doesn't work:

>>> print(var.decode('ascii'))
AttributeError: 'str' object has no attribute 'decode'. Did you mean: 'encode'?

Using regex to find and replace \\, \r and \n with their escaped values is unsuccessful, as the \n in \night is treated as a 0x0A.

Is it possible to specify which characters I want to decode, or is there a more appropriate module? I'm using Python 3.10.2.


Solution

  • Many thanks to everyone that contributed their answers, but none of them seemed to solve my issue completely. After long time of research I found this solution from sahil Kothiya (mirror) -- I modified it to resolve my specific issue:

    import re, codecs
    
    ESCAPE_SEQUENCE_RE = re.compile(r'''
        ( \\[\\nr]  # Single-character escapes
        )''', re.UNICODE | re.VERBOSE)
    
    def decode_escapes(s):
        def decode_match(match):
            return codecs.decode(match.group(0), 'unicode-escape')
    return ESCAPE_SEQUENCE_RE.sub(decode_match, s)
    

    Demonstration in IDLE:

    IDLE demo

    Special characters shown in Notepad++:

    NP++ demo

    Hex dump of output string:

    hexdump


    It even works with Unicode characters (an important component to my script).

    Demonstration in IDLE:

    IDLE demo-2

    Special characters shown in Notepad++:

    NP++ demo-2

    Hex dump of output string:

    hexdump-2