How to use Dask to parallelize up a `sel()` operation in xarray?

I have an array of values called speed, and I'm mapping it to another array of values of the same shape called power by looking up the nearest value in a lookup table speed_to_power_lut. This process takes about 2.5 seconds on my machine, and I want to speed it up.

import time

import numpy as np
import xarray as xr

LON = np.arange(0, 360, 0.25)
LAT = np.arange(-90, 90, 0.25)
TIME = np.arange(0, 24)
speed = xr.DataArray(
    np.random.uniform(high=10, size=(len(LON), len(LAT), len(TIME))),
    coords={'lon': LON, 'lat': LAT, 'time': TIME})
speed_to_power_lut = xr.DataArray(
    np.random.uniform(high=100.0, size=(100,)),
    coords={'speed': np.arange(0, 10, 0.1)})

start = time.perf_counter()
power = speed_to_power_lut.sel(speed=speed, method='nearest')
print(f'Without chunk: {time.perf_counter() - start:.3f} s')

speed = speed.chunk({'lon': len(LON) // 16})

start = time.perf_counter()
power = speed_to_power_lut.sel(speed=speed, method='nearest')
print(f'With chunk:    {time.perf_counter() - start:.3f} s')

The xarray documentation suggests that, if I chunk the array, Dask will automatically be used under the hood to make things faster. Unfortunately, that's not what I'm seeing:

Without chunk: 2.477 s
With chunk:    2.499 s

I'm somewhat new to xarray and entirely new to Dask, so maybe I'm just missing something trivial. Or is this particular use case not parallelized?

Solution

It's probably that speed_to_power_lut is not an dask array. Anyway with how sel works here it seems to me that dask wouldn't help too much with this operation.

Have you considered something like this:

speed_to_power_lut = np.append(speed_to_power_lut.values, speed_to_power_lut.values[99])
index = (speed.round(1)*10).astype(int)
power = speed_to_power_lut[index]

It's very hacky in this case, but in general I think a lookup with numpy applied to the values in the xarray will be quicker than using this somewhat esoteric sel logic.