Search code examples
python-3.xnetcdfpython-xarrayresampling

Resampling netcdf from daily to monthly keeping nan values


I have many daily NetCDF resulted from a hydrological model and I want to convert them to monthly/yearly level both by summing or averaging them. For this, I use the following code:

import xarray as xr
    
nc_file = r'J:\RESULTS\WB_PRECIPITATION.nc'
ds = xr.open_dataset(nc_file)
monthly_data=ds.resample(time='Y',skipna=True).sum()
output = r'J:\RESULTS\WB_PRECIPITATION_YEARLY.nc'
monthly_data.to_netcdf(output, engine="netcdf4")

The problem is that my original daily file has several zones with nan (_FillValue=-9999) and that when they pass to the new NetCDF they pass to have the value 0. In this case, that is distorting all the calculations.

I already check "skipna" parameter with True and False values and I got the same result.

In pandas, when I have had the same problem I have used the following code, however, I have not been able to adapt it for this situation.

import numpy as np
import pandas as pd 

def very_sum(array_like):
    if any(pd.isnull(array_like)):
        return np.nan
    else:
        return array_like.sum()

df = ... 
df_yearly = df.resample('Y').apply(very_sum)

How can I resample my data without losing the zones with nan. ?


Solution

  • I think you only misplaced the skipna keyword, it belongs in the method rather than in the resample. This is basically a duplicate of: xarray resampling with certain nan treatment

    So instead of:

    monthly_data=ds.resample(time='Y',skipna=True).sum()
    

    Just do:

    monthly_data=ds.resample(time='Y').sum(skipna=False)
    

    As a runnable example:

    import numpy as np
    import pandas as pd
    import xarray as xr
    
    time = pd.date_range("2000-01-01", "2000-12-31")
    da = xr.DataArray(data=np.ones(time.size), coords={"time": time}, dims=["time"])
    da.data[:45] = np.nan
    

    Default:

    da.resample(time="m").sum()
    
    <xarray.DataArray (time: 12)>
    array([ 0., 15., 31., 30., 31., 30., 31., 31., 30., 31., 30., 31.])
    Coordinates:
      * time     (time) datetime64[ns] 2000-01-31 2000-02-29 ... 2000-12-31
    

    skipna=False:

    da.resample(time="m").sum(skipna=False)
    
    <xarray.DataArray (time: 12)>
    array([nan, nan, 31., 30., 31., 30., 31., 31., 30., 31., 30., 31.])
    Coordinates:
      * time     (time) datetime64[ns] 2000-01-31 2000-02-29 ... 2000-12-31