Search code examples
pythoniotype-conversionbinaryfilesendianness

Convert Little-endian 24 bits file to an ASCII array


I have a .raw file containing a 52 lines html header followed by the data themselves. The file is encoded in little-endian 24bits SIGNED and I want to convert the data to integers in an ASCII file. I use Python 3.

I tried to 'unpack' the entire file with the following code found in this post:

import sys
import chunk
import struct

f1 = open('/Users/anais/Documents/CR_lab/Lab_files/labtest.raw', mode = 'rb')
data = struct.unpack('<i', chunk + ('\0' if chunk[2] < 128 else '\xff'))   

But I get this error message:

TypeError: 'module' object is not subscriptable

EDIT

It seems this is better:

data = struct.unpack('<i','\0'+ bytes)[0] >> 8

But I still get an error message:

TypeError: must be str, not type

Easy to fix I presume?


Solution

  • That's not a nice file to process in Python! Python is great for processing text files, because it reads them in big chunks in an internal buffer and then iterates on lines, but you cannot easily access binary data that comes after text read like that. Additionally, the struct module has no support for 24 bits values.

    The only way I can imagine is to read the file one byte at a time, first skip 52 time an end of line, then read bytes 3 at a time, concatenate them in a 4 bytes byte string and unpack it.

    Possible code could be:

    eol = b'\n'          # or whatever is the end of line in your file
    nlines = 52          # number of lines to skip
    
    with open('/Users/anais/Documents/CR_lab/Lab_files/labtest.raw', mode = 'rb') as f1:
    
        for i in range(nlines):       # process nlines lines
            t = b''                   # to store the content of each line
            while True:
                x = f1.read(1)        # one byte at a time
                if x == eol:          # ok we have one full line
                    break
                else:
                    t += x            # else concatenate into current line
            print(t)                  # to control the initial 52 lines
    
        while True:
            t = bytes((0,))               # struct only knows how to process 4 bytes int
            for i in range(3):            # so build one starting with a null byte
                t += f1.read(1)
            # print(t)
            if(len(t) == 1): break        # reached end of file
            if(len(t) < 4):               # reached end of file with uncomplete value
                print("Remaining bytes at end of file", t)
                break
            # the trick is that the integer division by 256 skips the initial 0 byte and keeps the sign
            i = struct.unpack('<i', t)[0]//256   # // for Python 3, only / for Python 2
            print(i, hex(i))                     # or any other more useful processing
    

    Remark: above code assumes that your description of 52 lines (terminated by an end of line) is true, but the shown image let think that last line is not. In that case, you should first count 51 lines and then skip the content of the last line.

    def skipline(fd, nlines, eol):
        for i in range(nlines):       # process nlines lines
            t = b''                   # to store the content of each line
            while True:
                x = fd.read(1)        # one byte at a time
                if x == eol:          # ok we have one full line
                    break
                else:
                    t += x            # else concatenate into current line
            # print(t)                  # to control the initial 52 lines
    
    with open('/Users/anais/Documents/CR_lab/Lab_files/labtest.raw', mode = 'rb') as f1:
        skiplines(f1, 51, b'\n')     # skip 51 lines terminated with a \n
        skiplines(f1, 1, b'>')       # skip last line assuming it ends at the >
    
        ...