Search code examples
pythonpython-3.xdecodingutf8-decodebytestream

Problems with decoding bytes into string or ASCII in python 3


I'm having a problem decoding received bytes with python 3. I'm controlling an arduino via a serial connection and read it with the following code:

import serial
arduino = serial.Serial('/dev/ttyACM0', baudrate=9600, timeout=20)
print(arduino.isOpen())
myData = arduino.readline()
print(myData)

The outcome I get looks like b'\xe1\x02\xc1\x032\x82\x83\x10\x83\xb2\x80\xb0\x92\x0b\xa0' or b'\xe1\x02"\xe1\x00\x83\x92\x810\x82\xb2\x82\x91\xb2\n' and tried to decode it the usual way via myData.decode('utf-8') and I get the error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 1: invalid start byte. I tried other decodings (ASCII, cp437, hex, utf-16), but always face the same error.

Do you have any suggestions, how I can decode the received bytes or which decoding the arduino requires? I already tried to decode it piece by piece using a for loop, but I always face the same error message.

And is there a general way to avoid decoding problems or to find out, which decoding I have to use?

Thanks in advance.


Solution

  • As @jsbueno said in the comments this is not a decoding problem, it is probably because the byte data being received is actually binary data. I had a very similar problem when reading binary data (bytes) from a file.

    There are 2 options to use here, the first one being the struct module:

    import struct
    a = open("somedata.img", "rb")
    b = a.read(2)  
    file_size, = struct.unpack("i",a.read(4))
    

    writing the code this way produces a tuple, so to get an integer, just use struct.unpack('i', a.read(4))[0]

    Another way which I used if you want to store the data in a numpy array is:

    import numpy as np
    
    f = open("somefile.img", "r")
    a = np.fromfile(f, dtype=np.uint32)