I am attempting to use Xarray's open_mfdataset() function to open a large number of spatiotemporal files. However, some levels of the hierarchy have different numbers of files even though they result in the same dimensions.
Imagine the file structure that I want to process looks like this:
[
[r1_2000_2050.nc, r1_2050_2100.nc],
[r2_2000_2025.nc, r2_2025_2050.nc, r2_2050_2100.nc]
]
All of the dimensions do match, the spatial dimensions are the same and though the second sublist has more files, the temporal dimensions still run from 2000-2100. I have confirmed that I can combine these files through a manual series of xarray merges and concats but I want to take advantage of open_mfdataset's parallel loading and chunking structure so that I don't load everything into memory.
When I try to load this structure with:
xr.open_mfdataset(nested_paths, combine='nested', concat_dim=['realization', 'time'])
I get this error:
ValueError: The supplied objects do not form a hypercube because sub-lists do not have consistent lengths along dimension0
If this is possible to do with xarray, that would be extremely beneficial.
Unfortunately Xarray doesn't currently support this. (I don't think you could even use Kerchunk to get around this, because it would imply "ragged"-length chunks.)
The reason xarray doesn't support this is because it would break the symmetry between dimensions. In your example concatenating along 'time'
then along 'realization'
would be well-defined, but concatenating along 'realization'
then 'time'
would have a dimension mismatch. For the cases that combine
supports right now, we can guarantee that either order would work.
We could perhaps imagine relaxing this constraint in xarray, so that combine='nested'
would succeed in this case if dim=['time', 'realization']
, but fail if dim=['realization', 'time']
.
If this is something you would like to see in xarray then you are welcome to help contribute it as a new feature :) But it's not something we are likely to prioritize implementing soon. (If you want to try implementing it I would start by disabling the exceptions here and see how much further through the code it gets.)