Search code examples
pythonlistbyte

How to decode a list of int into a string in Python 3?


So I have a stream of bytes, which I collect into a list as so:

byte_list.append(bytes[0])

This format decodes the bytes into integers (one of the few quirks I am finding about Python is why it decodes bytes into ASCII or integers without my asking)

So after a while I have this list of bytes

byte_list = [83, 0, 116, 0, 97, 0, 110, 0, 100, 0, 97, 0, 114, 0, 100, 0, 70, 0, 105, 0, 114, 0, 109, 0, 97, 0, 116, 0, 97, 0, 46, 0, 105, 0, 110, 0, 111]

How Can I decode this list into it's string values? What I thought of was:

for b in byte_list: 
    new_list.append(chr(byte_list[b]))

But this seems not correct. Could someone please offer guidance on how to decode this?


Solution

  • So I have a stream of bytes

    And you want text. Looking at the data, it is in UTF-16LE (little-endian) encoding. Decode it:

    >>> byte_list = [83, 0, 116, 0, 97, 0, 110, 0, 100, 0, 97, 0, 114, 0, 100, 0, 70, 0, 105, 0, 114, 0, 109, 0, 97, 0, 116,
     0, 97, 0, 46, 0, 105, 0, 110, 0, 111, 0]
    >>> bytes(byte_list).decode('utf-16le')
    'StandardFirmata.ino'
    

    Note that I added a final zero because the data was one byte short for a full UTF-16 stream. I assume the data was just a sample and not complete. UTF-16 requires two to four bytes per character.

    If you started with a byte stream, it is a list of values 0-255. It is only displayed, for convenience, as ASCII:

    >>> bytes(byte_list)
    b'S\x00t\x00a\x00n\x00d\x00a\x00r\x00d\x00F\x00i\x00r\x00m\x00a\x00t\x00a\x00.\x00i\x00n\x00o'
    

    In bytes format, you just need to .decode() it into Unicode text.