Search code examples
pythonimageimage-processingtiff

Downsample large .tif images while reading


I am working with hundreds of large high resolution .tif images that are very memory intensive to read in Python. Fortunately, I can often work with low resolution versions of these images by downsampling them after loading them. I am wondering if there is a way to only read part of the image into memory instead of the whole image to improve read speed.

The code below shows an example of what I would like, however, this still reads the whole image into memory before returning the downsampled array. Is it possible to only read every nth pixel values into memory to improve read speed?

from tifffile import imread

def standardOpen(f):
    im = imread(f)
    return(im)

def scaledOpen(f):
    im = imread(f)[::,::4,::4]
    return(im)

f_path = '/file_name.tif'

im = standardOpen(f_path)
print(im.shape)
>>(88, 2048, 2048)

im_scaled = scaledOpen(f_path)
print(im_scaled.shape)
>>(88, 512, 512)

EDIT: I have uploaded a sample image to dropbox: https://www.dropbox.com/s/xkm0bzudcv2sw5d/S000_t000002_V000_R0000_X000_Y000_C02_I1_D0_P00101.tif?dl=0

This image is 101 slice of 2048x2048 pixels. When I read it using tifffile.imread(image_path) I get a numpy array of shape (101, 2048, 2048)


Solution

  • The sample file, S000_t000002_V000_R0000_X000_Y000_C02_I1_D0_P00101.tif, is a multi-page TIFF. The image data in each page is stored uncompressed in one strip. To speed up reading sliced data from this specific kind of TIFF file, memory-map the frame data and copy the sliced data to a pre-allocated array while iterating over the pages in the file. Unless one wants to preserve noise characteristics, it is usually better to downsample using higher order filtering, e.g interpolation using OpenCV:

    import numpy
    import tifffile
    import cv2  # OpenCV for fast interpolation
    
    filename = 'S000_t000002_V000_R0000_X000_Y000_C02_I1_D0_P00101.tif'
    
    with tifffile.Timer():
        stack = tifffile.imread(filename)[:, ::4, ::4].copy()
    
    with tifffile.Timer():
        with tifffile.TiffFile(filename) as tif:
            page = tif.pages[0]
            shape = len(tif.pages), page.imagelength // 4, page.imagewidth // 4
            stack = numpy.empty(shape, page.dtype)
            for i, page in enumerate(tif.pages):
                stack[i] = page.asarray(out='memmap')[::4, ::4]
                # # better use interpolation instead:
                # stack[i] = cv2.resize(
                #     page.asarray(),
                #     dsize=(shape[2], shape[1]),
                #     interpolation=cv2.INTER_LINEAR,
                # )
    

    I would avoid this kind of micro-optimization for little speed gain. The image data in the sample file is only ~800 MB and easily fits into RAM on most computers.