Search code examples
pythonimageimage-processingimagemagickgdal

Resize huge images in Python (bigger than available RAM)


I would like to resize and split huge (1 TB) images to 256x256 pixel tiles (Zoomify / OSM / Google Maps / XYZ schema). Images would be in BigTIFF or PSB (Large Document Format or Photoshop Big) format.

What are the available libraries which can do this? I was having a look at GDAL but it was producing quite blurry output and I couldn't set it to interpolate better. Ideally I'd be looking at a Lanczos interpolator for such task.

Are there any native Python libraries, or wrappers for C based libraries which can do this? Can the Python wrapper for imagemagick do such thing?

If no Python library is available, I'm also open for command line based tools, which I can automate using Python.


Solution

  • libvips can process huge (larger than RAM) images efficiently. It's a streaming image processing library, so it can (in this case) decompress, resize, tile, and write all at the same time, and without having the whole image in memory or needing any temporary files.

    The dzsave operator will write a DeepZoom / Zoomify / Google Maps pyramid. You can run it from the command-line like this:

    $ vipsheader y.tif
    y.tif: 104341x105144 uchar, 3 bands, srgb, tiffload
    $ ls -l y.tif
    -rw-r--r-- 1 john john 32912503796 Jun 13 13:31 y.tif
    $ time vips dzsave y.tif x
    real    3m4.944s
    user    9m21.372s
    sys 7m20.232s
    peak RES: 640mb
    $ ls -R x_files/ | wc
     227190  227172 2784853
    

    So on my desktop it converted a 32GB image to 230,000 tiles in about 3 minutes. That's with a mechanical HDD, it might be quicker with a SSD. There's a chapter in the docs introducing dzsave.

    It has a Python binding, so you could also write:

    import pyvips
    
    image = pyvips.Image.new_from_file("y.tif", access="sequential")
    image.dzsave("x")
    

    The access option tells libvips that it should stream the image. It can read both BigTIFF and PSB. You'll find BigTIFF is a lot quicker and needs much less memory.