I have large binary file (size ~2.5Gb). It contains header (size 336 byte) and seismic signal data (x, y and z channels) with type int32. Count of discrete is 223 200 000. I need read part of signal. For example, I want get part of signal in interval of discrete [216 000 000, 219 599 999]. I wrote the function:
def reading(path, start_moment, end_moment):
file_data = open(path, 'rb')
if start_moment is not None:
bytes_value = start_moment * 4 * 3
file_data.seek(336 + bytes_value)
else:
file_data.seek(336)
if end_moment is None:
try:
signals = np.fromfile(file_data, dtype=np.int32)
except MemoryError:
return None
finally:
file_data.close()
else:
moment_count = end_moment - start_moment + 1
try:
signals = np.fromfile(file_data, dtype=np.int32,
count=moment_count * 3)
except MemoryError:
return None
finally:
file_data.close()
channel_count = 3
signal_count = signals.shape[0] // channel_count
signals = np.reshape(signals, newshape=(signal_count, channel_count))
return signals
If I run script with the function in PyCharm IDE I get error:
Traceback (most recent call last): File "D:/AppsBuilding/test/testReadBaikal8.py", line 41, in signal_2 = reading(path=path, start_moment=216000000, end_moment=219599999) File "D:/AppsBuilding/test/testReadBaikal8.py", line 27, in reading count=moment_count * 3) OSError: obtaining file position failed
But if I run script with parameters: start_moment=7200000, end_moment=10799999 all ok. On my PC was installed Windows7 32bit. Memory size is 1.95Gb Please, help me resolve this problem.
Divide the file into small segments, freeing memory after each small piece of content is processed
def read_in_block(file_path):
BLOCK_SIZE = 1024
with open(file_path, "r") as f:
while True:
block = f.read(BLOCK_SIZE)
if block:
yield block
else:
return
print block