I want to save an xarray.dataset as a .zarr file, but I cannot configure my chunks to be uniform and it will not save.
I have tried:
changing chunk size when using xarray.open_mfdataset -> it still uses auto chunks which do not work.
changing chunk size when using dataset.chunk(n) -> still refers to automatic chunks when opening dataset.
CODE:
import xarray as xr
import glob
import zarr
local_dir = "/directory/"
data_dir = local_dir + 'folder/'
files = glob.glob(data_dir + '*.nc')
n = 1320123
data_files = xr.open_mfdataset(files,concat_dim='TIME',chunks={'TIME': n}) # does not specify chunks, uses automatic chunks
data_files.chunk(n) # try modifying here, still uses automatic chunks
data_files.to_zarr(store=data_dir + 'test.zarr',mode='w') # I get an error about non-uniform chunks - see below
ValueError: Zarr requires uniform chunk sizes except for final chunk. Variable dask chunks ((1143410, 512447, 1170473, 281220, 852819),) are incompatible. Consider rechunking using
chunk()
.
I expect the .zarr file to save with new chunks, but refers back to original automatic chunksizes.
Xarray's Dataset.chunk
method return a new dataset, so you would need something more like:
ds = xr.open_mfdataset(files, concat_dim='TIME').chunk({'TIME': n})
ds.to_zarr(...)
A few other details to note:
Why the chunks
kwarg open_mfdataset
doesn't behave as desired: Currently, chunks along the concat_dim
are fixed to the length of data in each file. I also suspect this is why you have irregular chunk sizes.
open_mfdataset
will do the glob for you. This a minor time savor but something to note in the future, you can just call xr.open_mfdataset('/directory/folder/*nc', ...)
.