I have a fixed kernel and I want the ability to incorporate user defined device functions to alter the output. The user defined functions will always have the same input arguments and will always output a scalar value. If I knew the user defined functions at compile time I could just pass them in as pointers to the kernel (and have a default device function that operates on the input if given no function). I have access to the user defined function's PTX code at runtime and am wondering if I could use something like NVIDIA's jitify to compile the PTX at run time, get a pointer to the device function, and then pass this device function to the precompiled kernel function.
I have seen a few postings that get close to answering this (How to generate, compile and run CUDA kernels at runtime) but most suggest compiling the entire kernel along with the device function at runtime. Given that the device function has fixed inputs and outputs I don't see any reason why the kernel function couldn't be compiled ahead of time. The piece I am missing is how to compile just the device function at run time and get a pointer to it to then pass to the kernel function.
You can do that doing the following:
The main element is to make sure to call the kernel from the generated module, and not from the module that is magically linked with your program.