python python-3.x encoding non-ascii-characters

Issue in encode/decode in python 3 with non-ascii character

I am trying to use python3 unicode_escape to escape \n in my string, but the challenge is there are non-ascii characters present in the whole string, and if I use utf8 to encode and then decode the bytes using unicode_escape then the special character gets garbled. Is there any way to have the \n escaped with a new line without garbling the special character?

s = "hello\\nworld└--"
print(s.encode('utf8').decode('unicode_escape'))

Expected Result:
hello
world└--

Actual Result:
hello
worldâ--

Solution

As user wowcha observes, the unicode-escape codec assumes a latin-1 encoding, but your string contains a character that is not encodable as latin-1.

>>> s = "hello\\nworld└--"
>>> s.encode('latin-1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2514' in position 12: ordinal not in range(256)

Encoding the string as utf-8 gets around the encoding problem, but results in mojibake when decoding from unicode-escape

The solution is to use the backslashreplace error handler when encoding. This will convert the problem character to an escape sequence that can be encoded as latin-1 and does not get mangled when decoded from unicode-escape.

>>> s.encode('latin-1', errors='backslashreplace')
b'hello\\nworld\\u2514--'

>>> s.encode('latin-1', errors='backslashreplace').decode('unicode-escape')
'hello\nworld└--'

>>> print(s.encode('latin-1', errors='backslashreplace').decode('unicode-escape'))
hello
world└--