CUDA undefined when running nvcc in for a .cu file

I am trying to compile a .cu file which includes a .cuh file with a templated device function using nvcc. That same .cuh file is also included from a .cpp, so I am trying to prevent the templated device function to be visible from the .cpp side. To do so I am using

foo.cuh

#if defined(__CUDA__) && defined(__CUDA_ARCH__)

template <typename T>
__device__ void foo(){...}
#endif

However, when I try to use the device function from a kernel, it says the function doesn't exist while compiling the .cu. The .cu compiles fine if I remove the __CUDA__ check, but in that case it fails later, when compiling the .cpp file.

Am I missing something in the check?

Solution

There is no __CUDA__ macro defined by nvcc. Therefore, unless you defined it somewhere using your own methodology (?) your #if directive will always be skipped.

Perhaps you want __CUDACC__?

example:

$ cat test.cuh
#ifdef __CUDACC__
__device__ void foo(){};
#endif

void f();

$ cat main.cpp
#include <test.cuh>

int main(){
  f();
}
$ cat test.cu
#include <test.cuh>
__global__ void k(){foo();}

void f(){

  k<<<1,1>>>();
  cudaDeviceSynchronize();
}
$ nvcc -I. -o test test.cu main.cpp
$ g++  -I. -c main.cpp
$ nvcc -I. -o test test.cu main.o
$ compute-sanitizer ./test
========= COMPUTE-SANITIZER
========= ERROR SUMMARY: 0 errors
$

(the __CUDA_ARCH__ macro is not needed here either.)

__CUDA__ undefined when running nvcc in for a .cu file

CUDA undefined when running nvcc in for a .cu file