I have many daily NetCDF resulted from a hydrological model and I want to convert them to monthly/yearly level both by summing or averaging them. For this, I use the following code:
import xarray as xr
nc_file = r'J:\RESULTS\WB_PRECIPITATION.nc'
ds = xr.open_dataset(nc_file)
monthly_data=ds.resample(time='Y',skipna=True).sum()
output = r'J:\RESULTS\WB_PRECIPITATION_YEARLY.nc'
monthly_data.to_netcdf(output, engine="netcdf4")
The problem is that my original daily file has several zones with nan (_FillValue=-9999) and that when they pass to the new NetCDF they pass to have the value 0. In this case, that is distorting all the calculations.
I already check "skipna" parameter with True and False values and I got the same result.
In pandas, when I have had the same problem I have used the following code, however, I have not been able to adapt it for this situation.
import numpy as np
import pandas as pd
def very_sum(array_like):
if any(pd.isnull(array_like)):
return np.nan
else:
return array_like.sum()
df = ...
df_yearly = df.resample('Y').apply(very_sum)
How can I resample my data without losing the zones with nan. ?
I think you only misplaced the skipna
keyword, it belongs in the method rather than in the resample. This is basically a duplicate of: xarray resampling with certain nan treatment
So instead of:
monthly_data=ds.resample(time='Y',skipna=True).sum()
Just do:
monthly_data=ds.resample(time='Y').sum(skipna=False)
As a runnable example:
import numpy as np
import pandas as pd
import xarray as xr
time = pd.date_range("2000-01-01", "2000-12-31")
da = xr.DataArray(data=np.ones(time.size), coords={"time": time}, dims=["time"])
da.data[:45] = np.nan
Default:
da.resample(time="m").sum()
<xarray.DataArray (time: 12)>
array([ 0., 15., 31., 30., 31., 30., 31., 31., 30., 31., 30., 31.])
Coordinates:
* time (time) datetime64[ns] 2000-01-31 2000-02-29 ... 2000-12-31
skipna=False
:
da.resample(time="m").sum(skipna=False)
<xarray.DataArray (time: 12)>
array([nan, nan, 31., 30., 31., 30., 31., 31., 30., 31., 30., 31.])
Coordinates:
* time (time) datetime64[ns] 2000-01-31 2000-02-29 ... 2000-12-31