Numba/CUDA - Calling vectorized library function

I would like to use Numba to get the GPU to run the following function defined here, which has the signature:

@numba.vectorize(_signatures)
def ppf(p, df, mu, sigma):
    # do stuff here

I'm not sure if this is relevant, but I noticed that because this function is defined in a separate package, I can't specify in the decorator target='cuda'.

For example, consider the following snippet:

from numba import cuda, vectorize

@cuda.jit('float32(float32, float32, float32)', device=True)
def cu_device_fn(x, y, z):
    return x ** y / z

# define a ufunc that calls our device function
@vectorize(['float32(float32, float32, float32)'], target='cuda')
def cu_ufunc(x, y, z):
    return cu_device_fn(x, y, z)

Straightforwardly, I can call this function directly via cu_ufunc(10, 2, 2).

How do I do the same thing, i.e., get the GPU to run this imported Numba function?

Sorry for the lack of details. I'm a complete beginner to GPU programming and just totally lost.

Solution

I'm not sure if this is relevant, but I noticed that because this function is defined in a separate package, I can't specify in the decorator target='cuda'.

The problem isn’t that the function is defined in another module, it is that the function is already decorated. You can’t replace the decorator after import because the function is already defined (and by replaced by the decorated implementation).

When you see something like this:

@numba.vectorize(_signatures)
def ppf(p, df, mu, sigma):
    # do stuff here

what is really happening is this:

def ppf(p, df, mu, sigma):
    # do stuff here

ppf = numba.vectorize(ppf, _signatures)

i.e. the original Python is passed to the vectorization system and the resulting wrapper to the compiled, vectorized code is assigned to the function. The original code no longer exist (except perhaps a some internal version of the interpreter bytecode) and there is, therefore, no way to decorate it again.

You will have to fork the code and modify it to suit your purposes. Note that it is perfectly possible to do something like this:

# define a ufunc that calls our device function
def _ppf(p, df, mu, sigma):
    # do stuff here

ppf = numba.vectorize(_ppf, _signatures)
ppf_gpu = numba.vectorize(_ppf, _signatures, target='cuda')

which leaves you with two different JIT'd functions from the original source for different targets.

Postscript: It is implied here that whatever # do stuff here is, it will be pure python code which only use language features and functions which can be compiled by Numba for both the CUDA and CPU vectorization targets. If that is not the case then nothing in this answer will help you.