Search code examples
pythontype-conversionbyte

Converting Byte to String and Back Properly in Python3?


Given a random byte (i.e. not only numbers/characters!), I need to convert it to a string and then back to the inital byte without loosing information. This seems like a basic task, but I ran in to the following problems:

Assuming:

rnd_bytes = b'w\x12\x96\xb8'
len(rnd_bytes)

prints: 4

Now, converting it to a string. Note: I need to set backslashreplace as it otherwise returns a 'UnicodeDecodeError' or would loose information setting it to another flag value.

my_str = rnd_bytes.decode('utf-8' , 'backslashreplace')

Now, I have the string. I want to convert it back to exactly the original byte (size 4!):

According to python ressources and this answer, there are different possibilities:

conv_bytes = bytes(my_str, 'utf-8')
conv_bytes = my_str.encode('utf-8')

But len(conv_bytes) returns 10.

I tried to analyse the outcome:

>>> repr(rnd_bytes)
"b'w\\x12\\x96\\xb8'"
>>> repr(my_str)
"'w\\x12\\\\x96\\\\xb8'"
>>> repr(conv_bytes)
"b'w\\x12\\\\x96\\\\xb8'"

It would make sense to replace '\\\\'. my_str.replace('\\\\','\\') doesn't change anything. Probably, because four backslashes represent only two. So, my_str.replace('\\','\') would find the '\\\\', but leads to

SyntaxError: EOL while scanning string literal

due to the last argument '\'. This had been discussed here, where the following suggestion came up:

>>> my_str2=my_str.encode('utf_8').decode('unicode_escape')
>>> repr(my_str2)
"'w\\x12\\x96¸'"

This replaces the '\\\\' but seems to add / change some other characters:

>>> conv_bytes2 = my_str2.encode('utf8')
>>> len(conv_bytes2)
6
>>> repr(conv_bytes2)
"b'w\\x12\\xc2\\x96\\xc2\\xb8'"

There must be a prober way to convert a (complex) byte to a string and back. How can I achieve that?


Solution

  • You could try to convert it to hex format. Then it is easy to convert it back to byte format.

    Sample code to convert bytes to string:

    hex_str = rnd_bytes.hex()
    

    Here is how 'hex_str' looks like:

    '771296b8'
    

    And code for converting it back to bytes:

    new_rnd_bytes = bytes.fromhex(hex_str)
    

    The result is:

    b'w\x12\x96\xb8'
    

    For processing you can use:

    readable_str = ''.join(chr(int(hex_str[i:i+2], 16)) for i in range(0, len(hex_str), 2))
    

    But never try to encode readable string, here is how readable string looks like:

    'w\x12\x96¸'
    

    After processing readable string convert it back to hex format before converting it back to bytes string like:

    hex_str = ''.join([str(hex(ord(i)))[2:4] for i in readable_str])