libvips / pyvips access small sections of a multi-channel tiff (OME-Tiff)

Wondering if there's a speedy way to return specific pixel ranges of a given channel of an ome-tiff file using pyvips / libvips. The crop doesn't allow for channel specfics.

My OME-Tiff is large (10 GB+) so I don't want to load the entire image into memory.

Open to any suggestions and/or other workflows.

Solution

pyvips supports multipage documents as "toilet-roll" images (sorry). You set n=-1 to load all the pages, and they appear as a very tall, thin image, with the pages stacked vertically. The metadata item page-height gives the height in pixels of each sheet.

Docs here:

https://libvips.github.io/libvips/API/current/VipsForeignSave.html#vips-tiffload

For example:

$ vipsheader -a multi-channel-z-series.ome.tif 
multi-channel-z-series.ome.tif: 439x167 char, 1 band, b-w, tiffload
width: 439
height: 167
bands: 1
format: char
coding: none
interpretation: b-w
xoffset: 0
yoffset: 0
xres: 0
yres: 0
filename: multi-channel-z-series.ome.tif
vips-loader: tiffload
n-pages: 15
image-description: <?xml version="1.0" encoding="UTF-8"?><!-- Warning: this comment is an OME-XML metadata block, which contains crucial dimensional parameters and other important metadata. Please edit cautiously (if at all), and back up the original data before doing so...
resolution-unit: cm
orientation: 1

You can see this is a 15 page OME image. pyvips will load page 0 by default, and each page is 439 by 167 pixels. You can fetch the XML in image-description to see the full OME channel metadata.

$ vipsheader -f image-description multi-channel-z-series.ome.tif
<?xml version="1.0" encoding="UTF-8"?>
<!--- ... etc.

In Python you can do:

$ python3
Python 3.8.5 (default, Jul 28 2020, 12:59:40) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyvips
>>> x = pyvips.Image.new_from_file("multi-channel-z-series.ome.tif", n=-1)
>>> x.size
>>> x.width
439
>>> x.height
2505
>>> x.get("page-height")
167
>>> x.height / x.get("page-height")
15.0

So you can use crop to fetch a rect from a channel in the obvious way.

Are you planning to generate patches for ML training? If you are, fetch can be much faster than crop for small patches. This issue has sample code and some benchmarks --- in that example, crop takes 41s to make 12,000 32x32 patches, but fetch takes just 0.5s.