Search code examples
pythonpython-xarrayzarr

zarr not respecting chunk size from xarray and reverting to original chunk size


I'm opening a zarr file and then rechunking it and then writing it back out to a different zarr store. Yet when I open it back up it doesn't respect the chunk size I previously wrote. Here is the code and the output from jupyter. Any idea what I'm doing wrong here?

bathy_ds = xr.open_zarr('data/bathy_store')
bathy_ds.elevation

enter image description here

bathy_ds.chunk(5000).elevation

enter image description here

bathy_ds.chunk(5000).to_zarr('data/elevation_store')
new_ds = xr.open_zarr('data/elevation_store')
new_ds.elevation

enter image description here

It is reverting back to the original chunking as if I'm not fully overwriting it or changing some other setting that needs changing.


Solution

  • This seems to be a known issue, and there's a fair bit of discussion going on within the issue's thread and a recently merged PR.

    Basically, the dataset carries the original chunking around in the .encoding property. So when you call the second write operation, the chunks defined in ds[var].encoding['chunks'] (if present) will be used to write var to zarr.

    According to the conversation in the GH issue, the currently best solution is to manually delete the chunk encoding for the variables in question:

    for var in ds:
        del ds[var].encoding['chunks']
    

    However, it should be noted that this seems to be an evolving situation, where it's be good to check in on the progress to adapt a final solution.

    Here's a little example that showcases the issue and solution:

    import xarray as xr
    
    # load data and write to initial chunking 
    x = xr.tutorial.load_dataset("air_temperature")
    x.chunk({"time":500, "lat":-1, "lon":-1}).to_zarr("zarr1.zarr")
    
    # display initial chunking
    xr.open_zarr("zarr1.zarr/").air
    

    enter image description here

    # rechunk
    y = xr.open_zarr("zarr1.zarr/").chunk({"time": -1})
    
    # display
    y.air
    

    enter image description here

    #write w/o modifying .encoding
    y.to_zarr("zarr2.zarr")
    
    # display
    xr.open_zarr("zarr2.zarr/").air
    

    enter image description here

    # delete encoding and store
    del y.air.encoding['chunks']
    y.to_zarr("zarr3.zarr")
    
    # display
    xr.open_zarr("zarr3.zarr/").air
    

    enter image description here