I'm working on a project that involves creating CUDA kernels in Python. Numba works quite well (what these guys have accomplished is quite incredible), and so does PyCUDA.
My problem is that I want to call a C device function from my Python generated kernel. I couldn't find a way to accomplish this. Numba can call CFFI modules but only in CPU code. In PyCUDA I can add my C device functions to the SourceModule, but I couldn't figure out how to include functions that already exist in another library.
Is there a way to accomplish this?
As far as I am aware, this isn't possible in either language. Neither exposes the necessary toolchain controls for separate compilation or APIs to do runtime linking of device code.