Search code examples
pythonpython-xarray

Specify concat_dim for xarray open_mfdataset


I'm using xarray.open_mfdataset() to open and combine 8 netcdf files (output from model simulations with different settings) without loading them into memory. This works great if I specify concat_dim='run_number', which adds run_number as a dimension without coordinates and just fills it with values from 0 to 7.

The problem is that now, I don't know which run_number belongs to which simulation. The original netcdf's all have attributes that help me to distinguish them, e.g. identifyer=1, identifyer=2, etc., but this is not recognized by xarray, even if I specify concat_dim='identifyer' (perhaps because there are many attributes?).

Is there any way in which I can tell xarray that it has to use this attribute as concat_dim? Or alternatively, in which order does xarray read the input files, so that I can infer which value of the new dimension corresponds to which simulation?


Solution

  • Xarray will use the values of existing scalar coordinates to label result coordinates, but it doesn't look at attributes. Only looking at metadata found in coordinates is a general theme in xarray: we leave attrs to user code only. So this should work you assign scalar 'identifyer' coordinates to each dataset, e.g., using the preprocess argument to open_mfdataset:

    def add_id(ds):
        ds.coords['identifyer'] = ds.attrs['identifyer']
        return ds
    
    xarray.open_mfdataset(path, preprocess=add_id)
    

    Alternatively, you can either pass an explicit list of filenames to open_mfdataset or rely on the fact that open_mfdataset sorts the glob of filenames before combining them: the datasets will always be combined in lexicographic order of their names.