Search code examples
cudanvcc

How do I get a cuFunction from a __global__ function I've written?


Suppose I want to use CUDA's lower-level driver API on some source I've written. I know about cuLaunchKernel, but I can't seem to find in the docs the exact explanation of how you get the cuFunction to pass to it from your __global__ functions,.


Solution

  • You use cuModuleGetFunction. The function name you pass must be the mangled C++ name if you are not using C linkage. You can get that using cuobjdump on a compiled version of your device source.