How to get PyCuda SourceModule to compile multiple source files containing device code?

I'm trying to use some LAPACKE functions inside a CUDA kernel to solve small systems of linear equations. I have a main source file that contains the kernel function I want to call. Inside that kernel function I want to call the LAPACKE function LAPACKE_dgesv(), which is defined in a different source file.

In my main source file I have included the header file lapacke.h which contains the declaration for LAPACKE_dgesv(). In addition I have edited lapacke.h to prepend __device__ to the function declaration of LAPACKE_dgesv().

I added the directory of the source file that contains the definition of LAPACKE_dgesv() to the include_dirs argument of the SourceModule call in my Python code. However when I run the code I get this error:

ptxas fatal   : Unresolved extern function 'LAPACKE_dgesv'

My guess is that the source file containing the definition of LAPACKE_dgesv() is not being compiled.

Is there a way to get PyCuda to compile multiple source files that contain device code? It seems that there would need to be a way for PyCuda to run the CUDA compiler with the --relocatable-device-code=true flag.

Solution

No, you can't do this with SourceModule.

There is an experimental DynamicSourceModule which has been added to the Master branch very recently and which probably can do what you want, although it isn't well documented and I have never used it. Otherwise, you can always statically compile and device link the code to a cubin file yourself outside of PyCUDA and just load the resulting device code via the standard APIs.