Search code examples
pythonfilepngfile-header

Python read file by bytes until sequence of bytes


How can I read a file in Python byte-by-byte until a specific sequence of bytes is reached?

This must happen all the time with libraries that read specific kinds of files to parse the header, scan for parameters, etc.

As an example: I'm reading through the PNG spec and see that pixel data starts after the byte sequence IDAT.

I can read the file like this:

with open('image.png', 'rb') as f:
    byte = f.read(1)
    while byte != '':
        byte = f.read(1)

But since I'm only reading one byte at a time, I can't watch for IDAT directly (since I'd only get the I but not the other three bytes). I can't read the file by chunks of four bytes because it won't always line up correctly.

I can imagine keeping track of the last four bytes but thought perhaps there was a more elegant way?


Solution

  • If you aren't married to the idea of going byte by byte, you can read the data in one long string then split it by occurrences of IDAT.

    with open('image.png', 'rb') as f:
        lines = f.readlines()
        combined_line = b''.join(lines)
        IDAT_splited = combined_line.split(b'IDAT')[1:]