I have a Python variable (named var
) containing a string with the following literal data:
day\r\n\\night
in hex, it is:
64 61 79 5C 72 5C 6E 5C 5C 6E 69 67 68 74 07
d a y \ r \ n \ \ n i g h t BEL
I need to decode \\
, \r
and \n
only.
The desired output (in hex):
64 61 79 0D 0A 5C 6E 69 67 68 74 07
d a y CR LF \ n i g h t BEL
Using decode
doesn't work:
>>> print(var.decode('ascii'))
AttributeError: 'str' object has no attribute 'decode'. Did you mean: 'encode'?
Using regex to find and replace \\
, \r
and \n
with their escaped values is unsuccessful, as the \n
in \night
is treated as a 0x0A.
Is it possible to specify which characters I want to decode
, or is there a more appropriate module? I'm using Python 3.10.2.
Many thanks to everyone that contributed their answers, but none of them seemed to solve my issue completely. After long time of research I found this solution from sahil Kothiya (mirror) -- I modified it to resolve my specific issue:
import re, codecs
ESCAPE_SEQUENCE_RE = re.compile(r'''
( \\[\\nr] # Single-character escapes
)''', re.UNICODE | re.VERBOSE)
def decode_escapes(s):
def decode_match(match):
return codecs.decode(match.group(0), 'unicode-escape')
return ESCAPE_SEQUENCE_RE.sub(decode_match, s)
Demonstration in IDLE:
Special characters shown in Notepad++:
Hex dump of output string:
It even works with Unicode characters (an important component to my script).
Demonstration in IDLE:
Special characters shown in Notepad++:
Hex dump of output string: