Search code examples
pythonformatunpack

struct.unpack() requires wrong length from bytes object with specific format pattern


I'm trying to decode a bytes object with 'BQ' format (i.e., unsigned char + unsigned long) on Python 3.6.2, which length is supposed to be 9 bytes, but struct.unpack gets an error asking for more bytes:

In [96]: struct.unpack('BQ',bytesObj)
---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
<ipython-input-96-667267f631a1> in <module>()
----> 1 struct.unpack('BQ',bytesObj)

error: unpack requires a bytes object of length 16

When I change the order of the format specifier to 'QB', it doesn't complain about the length, although it's supposed to be the same:

In [97]: struct.unpack('QB',bytesObj)
Out[97]: (35184770581765, 0)

But it gets even stranger when I replace 'B' for 'f', which should increase the required lenght in 3 bytes, but the error stays the same:

In [98]: struct.unpack('fQ',bytesObj)
---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
<ipython-input-98-c3792c78fd43> in <module>()
----> 1 struct.unpack('fQ',bytesObj)

error: unpack requires a bytes object of length 16

In [99]: struct.unpack('Qf',bytesObj)
---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
<ipython-input-99-78065617d606> in <module>()
----> 1 struct.unpack('Qf',bytesObj)

error: unpack requires a bytes object of length 12

No matter which format I used before 'Q', it gets always the same error asking for a length of 16. It seems to work fine only when there's no preceeding format to 'Q'.

Am I missing something?


Solution

  • The jump from 9 to 16 bytes happens because Python adds packing bytes to ensure that the elements in a struct are aligned on the same boundaries as in C.

    There is an explanation for this in section 7.3 of the manual.

    The q format elements (long long) and Q format elements (unsigned long long) are forced to align STARTING on 8 byte boundaries. Padding bytes are added AFTER any elements BEFORE q/Q to ensure this.

    Running the following code shows this in action:

    from struct import *
    
    print "QB: " + str(calcsize ('QB'))
    bytesObj = pack('QB', 1, 2)
    print unpack('QB', bytesObj)
    
    print "BQ: " + str(calcsize ('BQ'))
    bytesObj = pack('BQ', 1, 2)
    print unpack('BQ', bytesObj)
    
    print "qB: " + str(calcsize ('qB'))
    bytesObj = pack('qB', 1, 2)
    print unpack('qB', bytesObj)
    
    print "Bq: " + str(calcsize ('Bq'))
    bytesObj = pack('Bq', 1, 2)
    print unpack('Bq', bytesObj)
    
    print "Qf: " + str(calcsize ('Qf'))
    bytesObj = pack('Qf', 1, 2.0)
    print unpack('Qf', bytesObj)
    
    print "fQ: " + str(calcsize ('fQ'))
    bytesObj = pack('fQ', 1.0, 2)
    print unpack('fQ', bytesObj)
    

    This gives the following output:

    QB: 9
    (1, 2)
    BQ: 16
    (1, 2)
    qB: 9
    (1, 2)
    Bq: 16
    (1, 2)
    Qf: 12
    (1, 2.0)
    fQ: 16
    (1.0, 2)
    

    Hope this helps.

    (Edit): Also, as pointed out by the OP, this default behavior can be overridden; see the link in the comment below.