Search code examples
pythonbinaryfiles

Parsing binary files with Python


As a side project I would like to try to parse binary files (Mach-O files specifically). I know tools exist for this already (otool) so consider this a learning exercise.

The problem I'm hitting is that I don't understand how to convert the binary elements found into a python representation. For example, the Mach-O file format starts with a header which is defined by a C Struct. The first item is a uint_32 'magic number' field. When i do

magic = f.read(4)

I get

b'\xcf\xfa\xed\xfe'

This is starting to make sense to me. It's literally a byte array of 4 bytes. However I want to treat this like a 4-byte int that represents the original magic number. Another example is the numberOfSections field. I just want the number represented by 4-byte field, not an array of literal bytes.

Perhaps I'm thinking about this all wrong. Has anybody worked on anything similar? Do I need to write functions to look these 4-byte byte arrays and shift and combine their values to produce the number I want? Is endienness going to screw me here? Any pointers would be most helpful.


Solution

  • Take a look at the struct module:

    In [1]: import struct
    
    In [2]: magic = b'\xcf\xfa\xed\xfe'
    
    In [3]: decoded = struct.unpack('<I', magic)[0]
    
    In [4]: hex(decoded)
    Out[4]: '0xfeedfacf'