Search code examples
pythonpython-2.7binaryfilesunpack

Unpacking binary files


I have to read a binary file. So I have totally immersed myself in the python struct module. However there are still things that confuse me. Let's consider the following chunk of code:

import struct

print struct.pack('5c', *'Hello')
to_pack = (5.9, 14.87, 'HEAD', 32321, 238, 99)
packed = struct.pack('2f4s3i', *to_pack)
print "packed: ", packed

output:

Hello
packed:  �̼@��mAHEADA~�

I packed successively 2 floats, a 4 chars string, and three integers. Then when unpacking:

unpacked = struct.unpack('2f4s3i', packed)
print "unpacked: ", unpacked

Output:

 unpacked:  (5.900000095367432, 14.869999885559082, 'HEAD', 32321, 238, 99)

So the packing function turned my original data into binary data, whilst unpacking did the opposite. However, does it mean I necessarily have to know how my data is organized, do I necessarily have to know which types are encoded, and their respective order? What if I don't, how could I guess the right type order of my data? For instance if I do:

unpacked = struct.unpack('2f4s3h', packed)  # I replaced the 3i with 3h 
print "unpacked: ", unpacked

I would get a nice error:

unpacked = struct.unpack('2f4s3h', packed)
struct.error: unpack requires a string argument of length 18

So it seems to me that whatever the binary data I get when reading a binary file, if I don't know the correct types in the right order, I could not convert it to its original form.

Is there a way to convert the data back to non-binary without specifying the expected types, or would I really be stuck with a non-usable binary file?

I mean, even among those creating huge binary files from gigantic ones, how woud they manage to retrieve their data successfully?

For information, my example was taken from this pdf file: https://gebloggendings.files.wordpress.com/2012/07/struct.pdf


Solution

  • Yes, it's raw binary data, so you need to tell Python about its structure in order to usefully unpack it. Python doesn't know whether that 24-byte blob of data you created in packed is 6 floats, or 6 ints, or 3 doubles, or any combination of those, or something completely different.

    >>> unpack('6f', packed)
    (5.900000095367432, 14.869999885559082, 773.08251953125, 4.5291367665442413e-41, 3.3350903450930646e-43, 1.3872854796815689e-43)
    >>> unpack('6i', packed)
    (1086115021, 1097722757, 1145128264, 32321, 238, 99)
    >>> unpack('3d', packed)
    (15686698.023046875, 6.8585591728324e-310, 2.10077583423e-312)
    >>> unpack('dfid', packed)
    (15686698.023046875, 773.08251953125, 32321, 2.10077583423e-312)