python numpy gdal satellite-image rasterio

Tradeoffs between indexing numpy array and opening file in rasterio

When using rasterio I can do either of the following ways to get a single band of a raster:

import rasterio
import numpy as np

dataset = rasterio.open('filepath')

# note that if you have the full dataset read in with image = dataset.read() you can do:
image = dataset.read()
print(image.shape)
red_band = image[2, :, :] # this 
print(red_band.shape)

# which is equal to simply doing
red_band_read = dataset.read(3)
print(red_band_read.shape)

if np.array_equal(red_band_read, red_band):
    print('They are the same.')

And it will print out:

(8, 250, 250)
(250, 250)
(250, 250)
They are the same.

But I'm curious which is 'better'? I assume indexing into a numpy array is way faster than reading from a file but having some of these large satellite images open is prohibitively memory intensive. Are there any good reasons to do one or the other?

Solution

You might try timing each method and see if there is a difference!

If all you need is the data from the red band, I would certainly use the latter method rather than reading all bands to memory and the then slicing off the red band from the larger array.

In a similar vein, if you already know the subset of the data you want to look at, you can use rasterio windowed reading and writing to further reduce memory consumption: