Search code examples
pythondasknetcdfpython-xarray

Can Xarray's open_mfdataset() function work with variable number of files in the nested structure?


I am attempting to use Xarray's open_mfdataset() function to open a large number of spatiotemporal files. However, some levels of the hierarchy have different numbers of files even though they result in the same dimensions.

Imagine the file structure that I want to process looks like this:

[
    [r1_2000_2050.nc, r1_2050_2100.nc],
    [r2_2000_2025.nc, r2_2025_2050.nc, r2_2050_2100.nc]
]

All of the dimensions do match, the spatial dimensions are the same and though the second sublist has more files, the temporal dimensions still run from 2000-2100. I have confirmed that I can combine these files through a manual series of xarray merges and concats but I want to take advantage of open_mfdataset's parallel loading and chunking structure so that I don't load everything into memory.

When I try to load this structure with:

xr.open_mfdataset(nested_paths, combine='nested', concat_dim=['realization', 'time'])

I get this error: ValueError: The supplied objects do not form a hypercube because sub-lists do not have consistent lengths along dimension0

If this is possible to do with xarray, that would be extremely beneficial.


Solution

  • Unfortunately Xarray doesn't currently support this. (I don't think you could even use Kerchunk to get around this, because it would imply "ragged"-length chunks.)

    The reason xarray doesn't support this is because it would break the symmetry between dimensions. In your example concatenating along 'time' then along 'realization' would be well-defined, but concatenating along 'realization' then 'time' would have a dimension mismatch. For the cases that combine supports right now, we can guarantee that either order would work.

    We could perhaps imagine relaxing this constraint in xarray, so that combine='nested' would succeed in this case if dim=['time', 'realization'], but fail if dim=['realization', 'time'].

    If this is something you would like to see in xarray then you are welcome to help contribute it as a new feature :) But it's not something we are likely to prioritize implementing soon. (If you want to try implementing it I would start by disabling the exceptions here and see how much further through the code it gets.)