Search code examples

Have GTX Titan, having trouble with Dynamic Parallelism

I'm trying to call a CUDA kernel from another kernel, but get the following error :

Traceback (most recent call last):
  File "C:\temp\GPU Program", line 22, in <module>
  File "C:\Python33\lib\site-packages\pycuda\", line 262, in __init__
    arch, code, cache_dir, include_dirs)
  File "C:\Python33\lib\site-packages\pycuda\", line 252, in compile
    return compile_plain(source, options, keep, nvcc, cache_dir)
  File "C:\Python33\lib\site-packages\pycuda\", line 134, in compile_plain
    cmdline, stdout=stdout.decode("utf-8"), stderr=stderr.decode("utf-8"))
pycuda.driver.CompileError: nvcc compilation of         c:\users\karste~1\appdata\local\temp\tmpgq8t45\ failed
[command: nvcc --cubin -arch sm_35 -m64 -Ic:\python33\lib\site-packages\pycuda\cuda]
[stderr: error: kernel launch from __device__ or __global__ functions requires separate         compilation mode

My understanding is that this is has to do with Dynamic Parallelism and the other question related to this error is due to a user without approppriate hardware. I have a GTX Titan, however, so it should be compatible. What am I missing?


After adding "options=['--cubin','-rdc=true' ,'-lcudart', '-lcudadevrt,','-Ic:\python33\lib\site-packages\pycuda\cuda']" to SourceModule, I get the following error:

Traceback (most recent call last):
  File "C:\temp\GPU Program", line 22, in <module>
""", options=['--cubin','-rdc=true' ,'-lcudart', '-lcudadevrt,','-Ic:\python33\lib\site-packages\pycuda\cuda'])
  File "C:\Python33\lib\site-packages\pycuda\", line 265, in __init__
self.module = module_from_buffer(cubin)
pycuda._driver.LogicError: cuModuleLoadDataEx failed: not found - 


  • Python is compiling the CUDA code on the fly:

    nvcc --cubin -arch sm_35 -m64 -Ic:\python33\lib\site-packages\pycuda\cuda

    In order to compile code containing dynamic parallelism, it's necessary to add specific switches to the compile command to enable separate compilation, device code linking, linking of the device runtime library, and the appropriate architecture target (sm_35).

    Some examples of valid nvcc command combinations are given in the programming guide section on dynamic parallelism.

    Your command line should look something like:

    nvcc --cubin -arch=sm_35 -m64 -rdc=true -Ic:\python33\lib\site-packages\pycuda\cuda -lcudadevrt

    You may also wish to read the nvcc manual on separate compilation.