Suppose I want to use CUDA's lower-level driver API on some source I've written. I know about cuLaunchKernel
, but I can't seem to find in the docs the exact explanation of how you get the cuFunction
to pass to it from your __global__
functions,.
You use cuModuleGetFunction. The function name you pass must be the mangled C++ name if you are not using C linkage. You can get that using cuobjdump on a compiled version of your device source.