Search code examples
pythonnumbapython-xarray

use xarray and numba packages to read data and calculate climatology


To speed up calculation of xarray packages, I tried to add numba guvectorize to functions, but there are several problems:

  1. If I write two functions: read_pr and day_clim, input of day_clim is no longer xarray since guvectorize is set to float64[:], float64[:]. Thus, groupby function does not work. I tried also xr.core.dataarray.DataArray[:], xr.core.dataarray.DataArray[:], but error pops NameError: name 'xr' is not defined.
  2. I would like to apply @guvectorize to read_pr, too. However, guvectorize needs type and shape declared at first, and the shape along each dimension should remain the same. For example,
    (m),(n),(n) -> (m,n)  # ok
    (n),() -> (m,n)  # error

Input in read_pr are string and float ( shape: () ), while the output is xarray ( type: <class 'xarray.core.dataarray.DataArray'>, shape: (l,m,n) )

Code:

from numba import float64, guvectorize
import numba
import numpy as np
import xarray as xr

path = '/data3/USERS/waynetsai/pyaos_wks_samples/data/'
fname = 'cmorph_sample.nc'

lats = -20
latn =  30
lon1 =  89
lon2 = 171
time1 = '2000-01-01'
time2 = '2020-12-31'


def read_pr(path, fname, time1, time2, lats, latn, lon1, lon2):
    with xr.open_dataset(path + fname) as pr_ds:
        pr = (pr_ds.sel(time=slice(time1,time2),
                               lat=slice(lats,latn),
                               lon=slice(lon1,lon2)).cmorph)
    return pr

pr = xr.apply_ufunc(read_pr, path, fname, time1, time2, lats, latn, lon1, lon2)

@guvectorize(
    "(float64[:], float64[:])",
    "(l,m,n) -> (l,m,n)"
)
def day_clim(pr):
    prGB = pr.groupby("time.day")
    prDayClim = prGB.mean("time")
    return prDayClim
prDayClim = xr.apply_ufunc(day_clim, pr)

All suggestions are welcome!


Solution

  • Numba does not support the functions of the xarray module. Thus, you cannot use Numba to speed up the function read_pr and day_clim. If you want to use Numba for such function, you need to get somehow Numpy arrays from xarrays, and even if you could, there is no groupby function in Numpy yet so this means you will need to rewrite this function, and even if you do that, I expect Numba not to be faster in this case (unless you write a very optimized implementation yourself).