I have multiple xarray datasets with the dimensions: target-latitudes (180) and target-longitudes 360) and one variable: variable1. Each of these datasets represents a source-gridcell and thus corresponds to a particular source-latitude and source-longitude; e.g., the dataset sourcelat25_sourcelon126_mm3_per_yr.nc corresponds to a gridcell with the source-latitude of 25 and a source-longitude of 126 which looks like this:
<xarray.Dataset>
Dimensions: (targetlongitude: 360, targetlatitude: 180)
Coordinates:
sourcelatitude float64 25.0
sourcelongitude float64 126.0
* targetlongitude (targetlongitude) float64 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0
* targetlatitude (targetlatitude) float64 90.0 89.0 88.0 87.0 ... -87.0 -88.0 -89.0
Data variables:
variable1 (targetlatitude, targetlongitude) float64 ...
My goal is to combine all datasets to obtain a dataset with complete source-latitude (180) and source-longitude (360) dimensions (as well as the target-latitude and target-longitude dimensions), like this:
<xarray.Dataset> Dimensions: (sourcelongitude: 360, sourcelatitude: 180, targetlongitude: 360, targetlatitude: 180)
I have tried to combine the datasets with xr.concat() however, that gave some issues. Then I tried xr.combine_by_coords() as you can see in the code example below:
directory = 'specified_directory'
filenames = [f for f in os.listdir(directory) if f.startswith('start') and f.endswith('end.nc')]
combined_ds = None
for filename in filenames:
ds = xr.open_dataset(os.path.join(directory, filename))
if combined_ds is None:
combined_ds = ds.copy()
else:
if 'sourcelatitude' in combined_ds.dims:
ds = ds.expand_dims(dim = ['sourcelatitude', 'sourcelongitude'])
combined_ds = xr.combine_by_coords([combined_ds, ds], join= 'exact')
else:
ds = ds.expand_dims(dim=['sourcelatitude', 'sourcelongitude'])
combined_ds = combined_ds.expand_dims(dim=['sourcelatitude', 'sourcelongitude'])
combined_ds = xr.combine_by_coords([combined_ds, ds], join='exact')
This works for the first and the second iteration of the loop, and then gives me the error:
ValueError: Resulting object does not have monotonic global indexes along dimension sourcelongitude
Does anyone have any insights about how to solve this or perhaps another way to combine these datasets? I would appreciate it very much, thank you for reading!
The xarray docs have a section on combining along multiple dimensions with options for combining.
I am partial to combine_nested
as it allows you to be explicit about the ordering of data along each dim. But combine_by_coords
works great too!
The biggest changes to your code I’d make are:
combined_ds = combined_ds.compute()
at the end of the code below.Here are my updates to your code:
directory = 'specified_directory'
filenames = [
f for f in os.listdir(directory)
if f.startswith('start') and f.endswith('end.nc')
]
combined_ds = []
for filename in filenames:
fp = os.path.join(directory, filename)
ds = xr.open_dataset(fp).chunk()
combined_ds.append(
ds.expand_dims(
['sourcelatitude', 'sourcelongitude']
)
)
combined_ds = xr.combine_by_coords(
combined_ds, join= 'exact'
)
Alternatively, you could do this in one step and open the files in parallel with open_mfdataset
:
def preprocess(ds):
return ds.expand_dims(
['sourcelatitude', 'sourcelongitude']
)
fps = [
os.path.join(directory, filename)
for filename in filenames
]
ds = xr.open_mfdataset(
fps,
combine="by_coords",
parallel=True,
preprocess=preprocess,
)
Separately, I’d also consider using float32. It’ll halve the size of your data and I doubt you have such high precision estimates of moisture flows that the difference would be significant.