Numba jit on Dask distrbuted workers

I have a module that contains a few numba.jit and numba.guvectorize functions. When I'm importing the module, it takes a few seconds to compile the code.

My question is, what's the best way to make sure the code is compiled and ready on the workers come computation? Or is there actually any benefit in compiling it beforehand?

My current solution is the following, which I don't think is particular elegant and I'm not sure if it's even useful:

import functools
from distributed import Client

import foo # module with numba code

@functools.partial
def callback():
    import foo

client = Client()
client.register_worker_callbacks(callback)

This way, the code is compiled when the callback is registered and ready for computation.

So is this necessary? Is there any better way to handle this? Any pointers in handling numba code with dask?

Solution

You are right in your assumption that a numba function will be compiled on each worker when first accessed - either in the import if you have explicit compilation, or when the function is called if implicit. This means that the first run of a task using the function will have additional latency.

I would say that no to to have the worker callback if the "typical" workflow and doesn't have any serious drawbacks. By forcing a compile early, you will ensure that subsequent tasks using the function have similar performance. That might in rare cases be useful, since the time-to-complete for previous tasks is used in dask's task stealing mechanism, and perhaps you want the performance page of the dashboard not to include the compilation task.

You could achieve pre-compilation using a worker startup script (see here). Your solution is simpler, but won't survive worker restarts.