Search code examples
pythongdal

GDAL: write to zip archive


I know that starting version 2.2, gdal has virtual filesystem drivers, like /vsizip/ for accessing zip archives.

Previously I've been able to read from zip archives, but I haven't manage to write to them (which should also be possible I guess). I didn't find much documentation about this either.

This is what I tried:

import gdal
from zipfile import ZipFile
import numpy as np

with ZipFile('test.zip','a') as zfile:
    driver = gdal.GetDriverByName('GTiff')
    raster = driver.Create('/vsizip/test.zip/testraster.tif',10,10, 1, gdal.GDT_Float32)
    raster.GetRasterBand(1).writeArray(np.random.random(10,10))
    raster = None

But I'm always receiving

AttributeError: 'NoneType' object has no attribute 'GetRasterBand'

EDIT:

Updated the snipped as per Gabriella Giordano's suggestion.

EDIT 2:

Gabriella pointed out that the GTiff driver needs both read and write access at the same time, which is (currently?) not supported.

The suggested workaround is creating a temporary file and then copy it into the archive.

I'm really interested into directly creating a raster file (doesn't have to be a tiff) inside the archive. If I wanted to copy them, I might as well use zipfile or the like.

Any ideas?

EDIT 3:

Full disclosure*, this is what I'm trying to achieve:

I have MODIS hdf files, and I want to directly save the value raster subdatasets into a zip archive.

Solution:

I've finally managed to do it with the excellent insight from Gabriella.

I ended up using gdal.Translate with this passed in to options:

gdal.TranslateOptions(creationOptions=['STREAMABLE_OUTPUT=TRUE'])


*I did't think this was pertinent information, but Gabriela's answer makes me think it is.


Solution

  • I'm not really fluent in python, but the dataset creation is probably failing because you are not setting the number of bands.

    Try this:

    raster = driver.Create('/vsizip/test.zip/testraster.tif',10,10, 1, gdal.GDT_Float32)
    

    Change 1 with the desired number of bands.

    You can find the complete list of arguments of Create here.

    EDIT:

    Found the possible cause of the issue (and a possible workaround) here. As reported at this link, apparently the open mode of Create is incompatible with the vsizip virtual filesystem.

    Edit 2

    This is not a problem of the driver you use (e.g. GeoTiff), but a problem of the virtual file system itself. The Create call tries to open a file in both read/write mode because GDAL uses the file system as a cache for performance reasons, not only for storage. But in this case, the virtual vsizip file system does not allow to read and write from a file at the same time, because it does not support random access.

    The vsizip vfs indeed provides on-the-fly reads and writes separately, because it compress/decompress data for you in the background. This means that you cannot jump back and forth (as in random access mode requested for read/write open mode) because what you write on disk (compressed data) is different from what you could eventually read (uncompressed data).

    Edit 3

    I've found out that gdal command line tools (like gdal_translate or gdal_warp) set the 'STREAMABLE_OUTPUT = TRUE' automagically when working with virtual file systems. A streamable file format enforces a Write only policy that should comply with non-random access file systems. You can set this option as an additional parameter for the Create call. Again my limited python capabilities prevent me from providing a working snippet.

    Please consider though, that steamable also implies some limitation on the kind of operations that you can perform on the dataset, so this is probably not going to fix your problem, but it is useful to know anyway...