Specify concat_dim for xarray open_mfdataset

I'm using xarray.open_mfdataset() to open and combine 8 netcdf files (output from model simulations with different settings) without loading them into memory. This works great if I specify concat_dim='run_number', which adds run_number as a dimension without coordinates and just fills it with values from 0 to 7.

The problem is that now, I don't know which run_number belongs to which simulation. The original netcdf's all have attributes that help me to distinguish them, e.g. identifyer=1, identifyer=2, etc., but this is not recognized by xarray, even if I specify concat_dim='identifyer' (perhaps because there are many attributes?).

Is there any way in which I can tell xarray that it has to use this attribute as concat_dim? Or alternatively, in which order does xarray read the input files, so that I can infer which value of the new dimension corresponds to which simulation?

Solution

Xarray will use the values of existing scalar coordinates to label result coordinates, but it doesn't look at attributes. Only looking at metadata found in coordinates is a general theme in xarray: we leave attrs to user code only. So this should work you assign scalar 'identifyer' coordinates to each dataset, e.g., using the preprocess argument to open_mfdataset:

def add_id(ds):
    ds.coords['identifyer'] = ds.attrs['identifyer']
    return ds

xarray.open_mfdataset(path, preprocess=add_id)

Alternatively, you can either pass an explicit list of filenames to open_mfdataset or rely on the fact that open_mfdataset sorts the glob of filenames before combining them: the datasets will always be combined in lexicographic order of their names.