so There is a 4GB .TIF image that needs to be processed, as a memory constraint I can't load the whole image into numpy array so I need to load it lazily in parts from hard disk. so basically I need and that needs to be done in python as the project requirement. I also tried looking for tifffile library in PyPi tifffile but I found nothing useful please help.
pyvips can do this. For example:
import sys
import numpy as np
import pyvips
image = pyvips.Image.new_from_file(sys.argv[1], access="sequential")
for y in range(0, image.height, 100):
area_height = min(image.height - y, 100)
area = image.crop(0, y, image.width, area_height)
array = np.ndarray(buffer=area.write_to_memory(),
dtype=np.uint8,
shape=[area.height, area.width, area.bands])
The access
option to new_from_file
turns on sequential mode: pyvips will only load pixels from the file on demand, with the restriction that you must read pixels out top to bottom.
The loop runs down the image in blocks of 100 scanlines. You can tune this, of course.
I can run it like this:
$ vipsheader eso1242a-pyr.tif
eso1242a-pyr.tif: 108199x81503 uchar, 3 bands, srgb, tiffload_stream
$ /usr/bin/time -f %M:%e ./sections.py ~/pics/eso1242a-pyr.tif
273388:479.50
So on this sad old laptop it took 8 minutes to scan a 108,000 x 82,000 pixel image and needed a peak of 270mb of memory.
What processing are you doing? You might be able to do the whole thing in pyvips. It's quite a bit quicker than numpy.