I'm trying to use some LAPACKE functions inside a CUDA kernel to solve small systems of linear equations. I have a main source file that contains the kernel function I want to call. Inside that kernel function I want to call the LAPACKE function LAPACKE_dgesv()
, which is defined in a different source file.
In my main source file I have included the header file lapacke.h
which contains the declaration for LAPACKE_dgesv()
. In addition I have edited lapacke.h
to prepend __device__
to the function declaration of LAPACKE_dgesv()
.
I added the directory of the source file that contains the definition of LAPACKE_dgesv()
to the include_dirs
argument of the SourceModule call in my Python code. However when I run the code I get this error:
ptxas fatal : Unresolved extern function 'LAPACKE_dgesv'
My guess is that the source file containing the definition of LAPACKE_dgesv()
is not being compiled.
Is there a way to get PyCuda to compile multiple source files that contain device code? It seems that there would need to be a way for PyCuda to run the CUDA compiler with the --relocatable-device-code=true
flag.
No, you can't do this with SourceModule
.
There is an experimental DynamicSourceModule
which has been added to the Master branch very recently and which probably can do what you want, although it isn't well documented and I have never used it. Otherwise, you can always statically compile and device link the code to a cubin file yourself outside of PyCUDA and just load the resulting device code via the standard APIs.