Search code examples
pythonmatlabbinaryfilestranslate

Reading binary file. Translate matlab to python


I'm going to translate the working matlab code for reading the binary file to python code. Is there an equivalent for

% open the file for reading
fid=fopen (filename,'rb','ieee-le');
% first read the signature
tmp=fread(fid,2,'char');
% read sizes
rows=fread(fid,1,'ushort');
cols=fread(fid,1,'ushort');

Solution

  • there's the struct module to do that, specifically the unpack function which accepts a buffer, but you'll have to read the required size from the input using struct.calcsize

    import struct
    endian = "<"  # little endian
    with open(filename,'rb') as f:
        tmp = struct.unpack(f"{endian}cc",f.read(struct.calcsize("cc")))
        tmp_int = [int.from_bytes(x,byteorder="little") for x in tmp]
        rows = struct.unpack(f"{endian}H",f.read(struct.calcsize("H")))[0]
        cols = struct.unpack(f"{endian}H",f.read(struct.calcsize("H")))[0]
    

    you might want to use the struct.Struct class for reading the rest of the data in chunks, as it is going to be faster than decoding numbers one at a time. ie:

    data = []
    reader = struct.Struct(endian + "i"*cols)  # i for integer
    row_size = reader.size
    for row_number in range(rows):
        row = reader.unpack(f.read(row_size))
        data.append(row)
    

    Edit: corrected the answer, and added an example for larger chuncks.

    Edit2: okay, more improvement, assuming we are reading 1 GB file of shorts, storing it as python int makes no sense and will most likely give an out of memory error (or system will freeze), the proper way to do it is using numpy

    import numpy as np
    data = np.fromfile(f,dtype=endian+'H').reshape(cols,rows)  # ushorts
    

    this way it'll have the same space in memory as it did on disk.