To speed up calculation of xarray packages, I tried to add numba guvectorize to functions, but there are several problems:
read_pr
and day_clim
, input of day_clim
is no longer xarray since guvectorize is set to float64[:], float64[:]
. Thus, groupby function does not work. I tried also xr.core.dataarray.DataArray[:], xr.core.dataarray.DataArray[:]
, but error pops NameError: name 'xr' is not defined
.read_pr
, too. However, guvectorize needs type and shape declared at first, and the shape along each dimension should remain the same.
For example, (m),(n),(n) -> (m,n) # ok
(n),() -> (m,n) # error
Input in read_pr
are string and float ( shape: () ), while the output is xarray ( type: <class 'xarray.core.dataarray.DataArray'>, shape: (l,m,n) )
Code:
from numba import float64, guvectorize
import numba
import numpy as np
import xarray as xr
path = '/data3/USERS/waynetsai/pyaos_wks_samples/data/'
fname = 'cmorph_sample.nc'
lats = -20
latn = 30
lon1 = 89
lon2 = 171
time1 = '2000-01-01'
time2 = '2020-12-31'
def read_pr(path, fname, time1, time2, lats, latn, lon1, lon2):
with xr.open_dataset(path + fname) as pr_ds:
pr = (pr_ds.sel(time=slice(time1,time2),
lat=slice(lats,latn),
lon=slice(lon1,lon2)).cmorph)
return pr
pr = xr.apply_ufunc(read_pr, path, fname, time1, time2, lats, latn, lon1, lon2)
@guvectorize(
"(float64[:], float64[:])",
"(l,m,n) -> (l,m,n)"
)
def day_clim(pr):
prGB = pr.groupby("time.day")
prDayClim = prGB.mean("time")
return prDayClim
prDayClim = xr.apply_ufunc(day_clim, pr)
All suggestions are welcome!
Numba does not support the functions of the xarray module. Thus, you cannot use Numba to speed up the function read_pr
and day_clim
. If you want to use Numba for such function, you need to get somehow Numpy arrays from xarrays, and even if you could, there is no groupby
function in Numpy yet so this means you will need to rewrite this function, and even if you do that, I expect Numba not to be faster in this case (unless you write a very optimized implementation yourself).