How to encode unicode to bytes, so that the original string can be retrieved? in python 3.11

In python 3.11 we can encode a string like:

string.encode('ascii', 'backslashreplace')

Which works neatly for say: hellö => hell\\xf6

However when I insert hellö w\\xf6rld I get hell\\xf6 w\\xf6rld (notice the second one has an literal part that looks like a character escape sequence)

Or in other words the following holds:

'hellö wörld'.encode('ascii', 'backslashreplace') == 'hellö w\\xf6rld'.encode('ascii', 'backslashreplace')

Which obviously means that data has been lost by the encoding.

Is there a way to make python actually encode correctly? So also backslashes are escaped themselves? Or a library to do so?

Solution

Use the unicode_escape codec and no error handler instead of the ascii codec with error handler. You are getting errors with the data being non-ascii and the error handler is causing the loss. The result will be only ASCII characters but it will handle the backslashes:

>>> 'hellö wörld'.encode('unicode_escape') == 'hell\\xf6 w\\xf6rld'.encode('unicode_escape')
False
>>> 'hellö wörld'.encode('unicode_escape')
b'hell\\xf6 w\\xf6rld'
>>> 'hell\\xf6 w\\xf6rld'.encode('unicode_escape')
b'hell\\\\xf6 w\\\\xf6rld'

If you don't have an ASCII requirement, then just .encode() (default UTF-8 in Python 3 which handles all Unicode). Then .decode() to restore.