Search code examples
pythondebuggingruntime-errorbinaryfiles

Python Read part of large binary file


I have large binary file (size ~2.5Gb). It contains header (size 336 byte) and seismic signal data (x, y and z channels) with type int32. Count of discrete is 223 200 000. I need read part of signal. For example, I want get part of signal in interval of discrete [216 000 000, 219 599 999]. I wrote the function:

def reading(path, start_moment, end_moment):
    file_data = open(path, 'rb')
    if start_moment is not None:
        bytes_value = start_moment * 4 * 3
        file_data.seek(336 + bytes_value)
    else:
        file_data.seek(336)

    if end_moment is None:
        try:
            signals = np.fromfile(file_data, dtype=np.int32)
        except MemoryError:
            return None
        finally:
            file_data.close()
    else:
        moment_count = end_moment - start_moment + 1
        try:
            signals = np.fromfile(file_data, dtype=np.int32,
                                  count=moment_count * 3)
        except MemoryError:
            return None
        finally:
            file_data.close()
    channel_count = 3
    signal_count = signals.shape[0] // channel_count
    signals = np.reshape(signals, newshape=(signal_count, channel_count))
    return signals

If I run script with the function in PyCharm IDE I get error:

Traceback (most recent call last): File "D:/AppsBuilding/test/testReadBaikal8.py", line 41, in signal_2 = reading(path=path, start_moment=216000000, end_moment=219599999) File "D:/AppsBuilding/test/testReadBaikal8.py", line 27, in reading count=moment_count * 3) OSError: obtaining file position failed

But if I run script with parameters: start_moment=7200000, end_moment=10799999 all ok. On my PC was installed Windows7 32bit. Memory size is 1.95Gb Please, help me resolve this problem.


Solution

  • Divide the file into small segments, freeing memory after each small piece of content is processed

    def read_in_block(file_path):
        BLOCK_SIZE = 1024
        with open(file_path, "r") as f:
            while True:
                block = f.read(BLOCK_SIZE)  
                if block:
                    yield block
                else:
                    return  
    
            print block