Search code examples
pythontiffrasterio

Display specific part of tiff image using rasterio without having to load the entire file


I have a large tiff file (around 2GB) containing a map. I have been able to successfully read the data and even display it using the following python code:

import rasterio
from rasterio.plot import show

with rasterio.open("image.tif") as img:
    show(img)
    data = img.read()

This works just fine. However, I need to be able to display specific parts of this map without having to load the entire file into memory (as it takes up too much of the RAM and is not doable on many other PCs). I tried using the Window class of rasterio in order to that, but when I tried to display the map the outcome was different from how the full map is displayed (as if it caused data loss):

import rasterio
from rasterio.plot import show
from rasterio.windows import Window

with rasterio.open("image.tif") as img:
    data = img.read(window=Window(0, 0, 100000, 100000))
    show(data)

So my question is, how can I display a part of the map without having to load into memory the entire file, while also making it look as if it had been cropped from the full map image?

thanks in advance :)


Solution

  • The reason that it displays nicely in the first case, but not in the second, is that in the first case you pass an instance of rasterio.DatasetReader to show (show(img)), but in the second case you pass in a numpy array (show(data)). The DatasetReader contains additional information, in particular an affine transformation and color interpretation, which show uses.

    The additional things show does in the first case (for RGB data) can be recreated for the windowed case like so:

    import rasterio
    from rasterio.enums import ColorInterp
    from rasterio.plot import show
    from rasterio.windows import Window
    
    with rasterio.open("image.tif") as img:
        window = Window(0, 0, 100000, 100000)
    
        # Lookup table for the color space in the source file
        source_colorinterp = dict(zip(img.colorinterp, img.indexes))
    
        # Read the image in the proper order so the numpy array will have the colors in the
        # order expected by matplotlib (RGB)
        rgb_indexes = [
            source_colorinterp[ci]
            for ci in (ColorInterp.red, ColorInterp.green, ColorInterp.blue)
        ]
        data = img.read(rgb_indexes, window=window)
    
        # Also pass in the affine transform corresponding to the window in order to
        # display the correct coordinates and possibly orientation
        show(data, transform=img.window_transform(window))
    

    (I figured out what show does by looking at the source code here)


    In case of data with a single channel, the underlying matplotlib library used for plotting scales the color range based on the min and max value of the data. To get exactly the same colors as before, you'll need to know the min and max of the whole image, or some values that come reasonably close.

    Then you can explicitly tell matplotlib's imshow how to scale:

    with rasterio.open("image.tif") as img:
        window = Window(0, 0, 100000, 100000)
        data = img.read(window=window, masked=True)
    
        # adjust these
        value_min = 0
        value_max = 255
    
        show(data, transform=img.window_transform(window), vmin=value_min, vmax=value_max)
    

    Additional kwargs (like vmin and vmax here) will be passed on to matplotlib.axes.Axes.imshow, as documented here. From the matplotlib documenation:

    vmin, vmax: float, optional
    When using scalar data and no explicit norm, vmin and vmax define the data range that the colormap covers. By default, the colormap covers the complete value range of the supplied data. It is deprecated to use vmin/vmax when norm is given. When using RGB(A) data, parameters vmin/vmax are ignored.

    That way you could also change the colormap it uses etc.