I am trying to compile a .cu file which includes a .cuh file with a templated device function using nvcc. That same .cuh file is also included from a .cpp, so I am trying to prevent the templated device function to be visible from the .cpp side. To do so I am using
foo.cuh
#if defined(__CUDA__) && defined(__CUDA_ARCH__)
template <typename T>
__device__ void foo(){...}
#endif
However, when I try to use the device function from a kernel, it says the function doesn't exist while compiling the .cu. The .cu compiles fine if I remove the __CUDA__
check, but in that case it fails later, when compiling the .cpp file.
Am I missing something in the check?
There is no __CUDA__
macro defined by nvcc. Therefore, unless you defined it somewhere using your own methodology (?) your #if
directive will always be skipped.
Perhaps you want __CUDACC__
?
example:
$ cat test.cuh
#ifdef __CUDACC__
__device__ void foo(){};
#endif
void f();
$ cat main.cpp
#include <test.cuh>
int main(){
f();
}
$ cat test.cu
#include <test.cuh>
__global__ void k(){foo();}
void f(){
k<<<1,1>>>();
cudaDeviceSynchronize();
}
$ nvcc -I. -o test test.cu main.cpp
$ g++ -I. -c main.cpp
$ nvcc -I. -o test test.cu main.o
$ compute-sanitizer ./test
========= COMPUTE-SANITIZER
========= ERROR SUMMARY: 0 errors
$
(the __CUDA_ARCH__
macro is not needed here either.)