Search code examples
pythonfilegeospatialpython-xarrayzarr

How to create and return a Zarr file from xarray Dataset?


How would I go about creating and return a file new_zarr.zarr from a xarray Dataset?

I know xarray.Dataset.to_zarr() exists but this returns a ZarrStore and I must return a bytes-like object.

I have tried using the tempfile module but am unsure how to proceed, how would I write an xarray.Dataset to a bytes-like object that reurns a .zarr file that can be downloaded?


Solution

  • Zarr supports multiple storage backends (DirectoryStore, ZipStore, etc.). If you are looking for a single file object, it sounds like the ZipStore is what you want.

    import xarray as xr
    import zarr
    
    ds = xr.tutorial.open_dataset('air_temperature')
    store = zarr.storage.ZipStore('./new_zarr.zip')
    ds.to_zarr(store)
    

    The zip file can be thought of as a single file zarr store and can be downloaded (or moved around as a single store).


    Update 1

    If you want to do this all in memory, you could extend zarr.ZipStore to allow passing in a BytesIO object:

    class MyZipStore(zarr.ZipStore):
        
        def __init__(self, path, compression=zipfile.ZIP_STORED, allowZip64=True, mode='a',
                     dimension_separator=None):
    
            # store properties
            if isinstance(path, str):  # this is the only change needed to make this work
                path = os.path.abspath(path)
            self.path = path
            self.compression = compression
            self.allowZip64 = allowZip64
            self.mode = mode
            self._dimension_separator = dimension_separator
    
            # Current understanding is that zipfile module in stdlib is not thread-safe,
            # and so locking is required for both read and write. However, this has not
            # been investigated in detail, perhaps no lock is needed if mode='r'.
            self.mutex = RLock()
    
            # open zip file
            self.zf = zipfile.ZipFile(path, mode=mode, compression=compression,
                                      allowZip64=allowZip64)
    

    Then you can create the create the zip file in memory:

    zip_buffer = io.BytesIO()
    
    store = MyZipStore(zip_buffer)
    
    ds.to_zarr(store)
    

    You'll notice that the zip_buffer contains a valid zip file:

    zip_buffer.read(10)
    b'PK\x03\x04\x14\x00\x00\x00\x00\x00'
    

    (PK\x03\x04 is the Zip file magic number)