Search code examples
pythonnumpyimage-processingpython-imaging-libraryvips

how can I load a single tif image in parts into numpy array without loading the whole image into memory?


so There is a 4GB .TIF image that needs to be processed, as a memory constraint I can't load the whole image into numpy array so I need to load it lazily in parts from hard disk. so basically I need and that needs to be done in python as the project requirement. I also tried looking for tifffile library in PyPi tifffile but I found nothing useful please help.


Solution

  • pyvips can do this. For example:

    import sys
    import numpy as np
    import pyvips
    
    image = pyvips.Image.new_from_file(sys.argv[1], access="sequential")
    
    for y in range(0, image.height, 100):
        area_height = min(image.height - y, 100)
        area = image.crop(0, y, image.width, area_height)
        array = np.ndarray(buffer=area.write_to_memory(),
                           dtype=np.uint8,
                           shape=[area.height, area.width, area.bands])
    

    The access option to new_from_file turns on sequential mode: pyvips will only load pixels from the file on demand, with the restriction that you must read pixels out top to bottom.

    The loop runs down the image in blocks of 100 scanlines. You can tune this, of course.

    I can run it like this:

    $ vipsheader eso1242a-pyr.tif 
    eso1242a-pyr.tif: 108199x81503 uchar, 3 bands, srgb, tiffload_stream
    $ /usr/bin/time -f %M:%e ./sections.py ~/pics/eso1242a-pyr.tif
    273388:479.50
    

    So on this sad old laptop it took 8 minutes to scan a 108,000 x 82,000 pixel image and needed a peak of 270mb of memory.

    What processing are you doing? You might be able to do the whole thing in pyvips. It's quite a bit quicker than numpy.