CUDA compiler fails to detect a host function being called on the (GPU) device

Please look at this code:

void bar() {}

__host__ __device__ void foo()
{
  bar();
}

__global__ void kernel()
{
  foo();
}

int main()
{
  kernel<<<1, 1>>>();
  gpuErrchk(cudaPeekAtLastError());

  gpuErrchk(cudaDeviceSynchronize());

  return 0;
}

I spent hours trying to solve the an illegal memory access was encountered runtime error. As it turned out, the reason is the bar() function - it's not declared as __device__. But! But the code compiles! It produces a warning, but compiles! The warning says:

warning: calling a __host__ function("bar") from a __host__ __device__
function("Test::foo") is not allowed

Since the compilation for my project produces a lot of output, I simply didn't see that warning. But if I remove the __device__ attribute from the foo() function, I get the expected error:

error: identifier "foo" is undefined in device code

The question is why the compiler prints only a warning and how to turn it into an error?

Solution

The question is why the compiler prints only a warning and how to turn it into an error?

The compiler prints only a warning because it doesn't know (at the point of compilation of the calling function) if the function will actually be called at runtime, in the objectionable configuration (i.e. on or from device code).

and how to turn it into an error?

From the nvcc manual you can add either:

-Werror all-warnings

to flag all warnings as errors

-Werror cross-execution-space-call

to only flag this type of warning as an error.

Also see here. To those who will ask why I didn't flag as a dupe, that other question doesn't include a question (or in the answer itself) about why the compiler behaves this way.