Search code examples
pythonkaitai-struct

Parsing binary messages with kaitai struct & python


I need to extract and process data (variably-sized binary messages) from a very large message log. Using the Gif example and the online documentation, I have defined and compiled the variably-sized message layout into msg_log.py. Calling msg_log.from_file("small_logfile") enables me to inspect and verify field values from the first message in the logfile.

For small logfiles that fit in memory, how do I get msg_log.py to inspect the 2nd, 3rd, and subsequent messages in the log?

For very large logfiles, I would expect to page the input through a byte buffer. I haven't done that yet and haven't found examples or discussion on how to go about it. How do I keep msg_log.py in sync with the paged byte buffer as the content changes?

My message structure is currently defined as follows. (I have also used "seq" instead of "instances", but still could only inspect the first message.)

meta:
  id: message
  endian: be
instances:
  msg_header:
    pos: 0x00
    type: message_header
  dom_header:
    pos: 0x06
    type: domain_header
  body:
    pos: 0x2b
    size: msg_header.length - 43
types:
  message_header:
    seq:
      - id: length
        type: u1
      <other fixed-size fields - 5 bytes>
  domain_header:
    seq:
      <fixed-size fields - 37 bytes>
  message_body:
    seq:
      - id: body
        size-eos: true

Solution

  • Parsing multiple structures in a row from a single stream can be achieved by something like:

    from msg_log import Message
    from kaitaistruct import KaitaiStream
    
    f = open("yourfile.bin", "rb")
    stream = KaitaiStream(f)
    obj1 = Message(stream)
    obj2 = Message(stream)
    obj3 = Message(stream)    
    # etc
    stream.close()
    

    I'm not sure what you mean by "paging through a byte buffer". The method above by itself does not load whole file into memory, it reads it using normal read()-like calls as requested.

    If you want somewhat better performance, and you deal with a large file of a fixed size, you can opt to do a memory mapping. This way you would be just using a region of memory, and OS would take care of input/output required to load relevant parts of the file into actual physical memory. For Python, there is a PR for runtime that implements helpers for that, or, you can just do it yourself by doing:

    from kaitaistruct import KaitaiStream
    import mmap
    
    f = open("yourfile.bin", "rb")
    with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as buf:
        stream = KaitaiStream(BytesIO(buf))
        obj1 = Message(stream)
        obj2 = Message(stream)
        obj3 = Message(stream)    
        # etc