I have data arrays (361x361) for Jan, Feb, March, Apr, Oct, Nov and Dec for a given year.
So far I've been storing them in individual netcdfs for every month in the year (e.g. 03.nc, 10.nc)
I'd like to combine all months into one netcdf, so that I can do something like:
march_data = data.sel(month='03')
or alternatively data.sel(month=3))
So far I've only been able to stack the monthly data in a 361x361x7 array and it's unhelpfully indexed so that to get March data you need to do data[:,:,2] and to get October it's data[:,:,4]. Clearly 2 & 4 do not intuitively correspond to the months of March and October. This is in part because python is indexed from zero and in part because I'm missing the summer months. I could put nan fields in for the missing months, but that wouldn't solve the index-0 issue.
My attempt so far:
data = xarray.Dataset( data_vars={'ice_type':(['x','y','time'],year_array),},
coords={'lon':(['x','y'],lon_target),
'lat':(['x','y'],lat_target),
'month_number':(['time'],month_int)})
Here year_array
is a 361x361x7 numpy array, and month_int
is a list that maps the third index of year_array
to the month number: [1,2,3,4,10,11,12]
.
When I try to get Oct data with oct = data.sel(month_number=10)
it throws an error.
On a side note, I'm aware that there's possibly a solution to be found here, but to be honest I don't understand how it works. My confusion is mostly based around how they use 'time' both as a dictionary key and list of times at the same time.
I think I've written a helper function to do something just like that:
def combine_new_ds_dim(ds_dict, new_dim_name):
"""
Combines a dictionary of datasets along a new dimension using dictionary keys
as the new coordinates.
Parameters
----------
ds_dict : dict
Dictionary of xarray Datasets or dataArrays
new_dim_name : str
The name of the newly created dimension
Returns
-------
xarray.Dataset
Merged Dataset or DataArray
Raises
------
ValueError
If the values of the input dictionary were of an unrecognized type
"""
expanded_dss = []
for k, v in ds_dict.items():
expanded_dss.append(v.expand_dims(new_dim_name))
expanded_dss[-1][new_dim_name] = [k]
new_ds = xr.concat(expanded_dss, new_dim_name)
return new_ds
If you have all of the data in individual netcdfs then you should be able to import them into individual dataArray
's. Assuming you've done that, you could then do
month_das = {
1: january_da,
2: february_da,
...
12: december_da
}
year_data = combine_new_ds_dim(month_das, 'month')
which would be the concatenation of all of the data along the new dimension month
with the desired coordinates. I think the main loop of the function is easy enough to separate if you want to use that alone.
EDIT:
For anyone looking at this in the future, there's a much easier way of doing this with builtin xarray functions. You can just concatenate along a new dimension
year_data = xr.concat([january_da, february_da, ..., december_da], dim="month")
which will create a new dataArray
with the constituent arrays concatenated along a new dimension, but without coordinates on that dimension. To add coordinates,
year_data["month"] = [1, 2, ..., 12]
at which point year_data
will be concatenated along the new dimension "month" and will have the desired coordinates along that dimension.