Search code examples
pythonstructunpack

Alignment/Packing in Python Struct.Unpack


I have a piece of hardware sending data at a fixed length: 2bytes, 1 bytes, 4 bytes, 4 bytes, 2 bytes, 4bytes for a total of 17 bytes. If I change my format to 18bytes the code works but values are incorrect.

format = '<2s1s4s4s2s4s'
print(struct.calcsize(format))
print(len(hardware_data))
splitdata = struct.unpack(format,hardware_data)

The output is 17, 18 and an error because of the mismatch. I think this is caused by alignment but I'm unsure and nothing I've tried had fixed this. Below are a couple typical strings, if I print(hardware_data) I noticed the 'R' and 'n' characters but I'm unsure how to handle.

b'\x18\x06\x00R\x1f\x01\x00\x00\x00\x00\x00\xd8\xff\x00\x00\x00\x00\x80'

b'\x18\x06\x00R\x1f\x01\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x80'


Solution

  • Odds are whatever is sending the data is padding it in some way you're not expecting.

    For example, if the first four byte field is supposed to represent an int, C struct padding rules would require a padding byte, after the one byte field (to align the next four byte field to four byte alignment). So just add the padding byte explicitly, changing your format string to:

    format = '<2s1sx4s4s2s4s'
    

    The x in there says "I expect a byte here, but it's padding, don't unpack it to anything." It's possible the pad byte belongs elsewhere (I have no idea what your hardware is doing); I notice the third byte is the NUL (\0) byte in both examples, but the spot I assumed would be padding is 'R', so it's possible you want:

    format = '<2sx1s4s4s2s4s'
    

    instead. Or it could be somewhere else (without knowing which of the fields is a char array in the hardware struct, and which are larger types with alignment requirements, it's impossible to say). Point is, your hardware is sending 18 bytes; figure out which one is garbage, and put the x pad byte at the appropriate location.

    Side-note: The repr of bytes objects will use ASCII or simpler ASCII escapes when available. That's why you see an R and a \n in your output; b'R' and b'\x52' are equivalent literals, as are b'\n' and b'\x0a' and Python chooses to use the "more readable" version (when the bytes is actually just ASCII, this is much more readable).