I'm using xarray.open_mfdataset()
to open and combine 8 netcdf files (output from model simulations with different settings) without loading them into memory. This works great if I specify concat_dim='run_number'
, which adds run_number
as a dimension without coordinates and just fills it with values from 0 to 7.
The problem is that now, I don't know which run_number belongs to which simulation. The original netcdf's all have attributes that help me to distinguish them, e.g. identifyer=1
, identifyer=2
, etc., but this is not recognized by xarray, even if I specify concat_dim='identifyer'
(perhaps because there are many attributes?).
Is there any way in which I can tell xarray that it has to use this attribute as concat_dim
? Or alternatively, in which order does xarray read the input files, so that I can infer which value of the new dimension corresponds to which simulation?
Xarray will use the values of existing scalar coordinates to label result coordinates, but it doesn't look at attributes. Only looking at metadata found in coordinates is a general theme in xarray: we leave attrs
to user code only. So this should work you assign scalar 'identifyer'
coordinates to each dataset, e.g., using the preprocess
argument to open_mfdataset
:
def add_id(ds):
ds.coords['identifyer'] = ds.attrs['identifyer']
return ds
xarray.open_mfdataset(path, preprocess=add_id)
Alternatively, you can either pass an explicit list of filenames to open_mfdataset
or rely on the fact that open_mfdataset
sorts the glob of filenames before combining them: the datasets will always be combined in lexicographic order of their names.