PROBLEM
I have an FFT-based application that uses FFTW3. I am working on porting the application to a CUDA-based implementation using CUFFT. Compiling and running the FFT core of the application standalone within Nsight works fine. I have moved from there to integrating the device code into my application.
When I run using with the CUFFT core code integrated into my application, cudaGetDeviceCount
returns a cudaErrorInsufficientDriver
error, although I did not get it with the Nsight standalone run. This call is made at the beginning of the run when I'm initializing the GPU.
BACKGROUND
I am running on CentOS 6, using CUDA 7.0 on a GeForce GTX 750, and icpc
12.1.5. I have also successfully tested a small example using a GT 610. Both cards work in Nsight (and I've also compiled and run command-line without problems, though not as extensively as from within Nsight).
To integrate the CUFFT implementation of the FFT core into my application, I compiled and device-linked with nvcc
and then used icpc
(the Intel C++ Compiler) to compile the host code and to link the device and host code to create a .so. I finally completed that step without errors or warnings (relying on this tutorial).
(The reasoning as to why I'm using a .so has a fair amount of history and additional background. Suffice it to say that making a .so is required for my application.)
The tutorial points out that compilation steps are different between generating the standalone executable (as I do in Nsight) and generating a device-linked library for inclusion in a .so. To get through the compilation, I had to add -lcudart
as described in the tutorial, as well as -lcuda
, to my icpc
linking call (as well as the -L
to add .../cuda-7.0/lib64
and .../cuda-7.0/lib64/stubs
as the paths to those libraries).
NOTE: nvcc
links in libcudart
by default. I'm assuming it does the same for libcuda
since Nsight doesn't include either of these libraries in any of the compile and linking steps.. As an aside, I do find it strange that although nvcc
links them in by default, they don't show up from a call to ldd
on the executable.
I also had to add --compiler-options '-fPIC'
to my nvcc
commands to avoid errors described here.
I have seen some chatter (for one example, see this post) about Intel/NVCC compatibilities, but it looks like they arise at compile-time with older versions of NVCC, so...I think I'm ok on that account.
Finally, here are the compile commands for compilation of three .cu files (all are identical except for the name of the .cu file and the name of the .o file):
nvcc
-ccbin g++
-Iinc
-I/path/to/cuda/samples/common/inc
-m64
-O3
-gencode arch=compute_20,code=sm_20
-gencode arch=compute_30,code=sm_30
-gencode arch=compute_35,code=sm_35
-gencode arch=compute_37,code=sm_37
-gencode arch=compute_50,code=sm_50
-gencode arch=compute_52,code=sm_52
-gencode arch=compute_52,code=compute_52
--relocatable-device-code=true
--compile
--compiler-options '-fPIC'
-o my_object_file1.o
-c my_source_code_file1.cu
And here are the flags I pass to the device linking step:
nvcc
-ccbin g++
-Iinc
-I/path/to/cuda/samples/common/inc
-m64
-O3
-gencode arch=compute_20,code=sm_20
-gencode arch=compute_30,code=sm_30
-gencode arch=compute_35,code=sm_35
-gencode arch=compute_37,code=sm_37
-gencode arch=compute_50,code=sm_50
-gencode arch=compute_52,code=sm_52
-gencode arch=compute_52,code=compute_52
--compiler-options '-fPIC'
--device-link
my_object_file1.o
my_object_file2.o
my_object_file3.o
-o my_device_linked_object_file.o
I probably don't need the -gencode
flags for 30, 37, and 52, at least currently, but they shouldn't cause any problems, and eventually, I will likely compile that way.
And here are my compiling flags (minus the -o flag, and all my -I flags) that I use for the .cc file that uses calls my CUDA library:
-c
-fpic
-D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64
-fno-operator-names
-D_REENTRANT
-D_POSIX_PTHREAD_SEMANTICS
-DM2KLITE -DGCC_
-std=gnu++98
-O2
-fp-model source
-gcc
-wd1881
-vec-report0
Finally, here are my linking flags:
-pthread
-shared
Any ideas on how to fix this problem?
Don't add to LD_LIBRARY_PATH .../cuda7.0/lib64/stubs
. If you do, you will pick up libcuda.so from there instead of from the driver. (See this post).