I'm trying to understand how CUDA assert works under the hood. The assert
macro calls the __assert_fail
function, of which I can find the following signatures:
/usr/include/assert.h:extern void __assert_fail (const char *__assertion, const char *__file,
/usr/local/cuda-10.2/targets/x86_64-linux/include/crt/common_functions.h:extern __host__ __device__ __cudart_builtin__ void __assert_fail(
I can see they have the same signature, however the CUDA version has __host__ __device__
qualifiers.
Normally it's not possible to overload a function based on __host__ __device__
, since they are not part of the function signature (unless compiling CUDA code with Clang instead of NVCC):
extern void foo();
extern __host__ __device__ void foo();
Compiling with NVCC and warnings as errors:
main.cu(4): error: a __host__ function("foo") redeclared with __host__ __device__, hence treated as a __host__ __device__ function
So, why isn't this warning/error showing up with assert
? Is there any additional magic to make this work?
Consider the following example program main.cu which is compiled via nvcc --keep main.cu -o main
#include <cassert>
__global__ void kernel(){
assert(false);
}
int main(){
kernel<<<1,1>>>();
cudaDeviceSynchronize();
assert(false);
}
Compilation of a CUDA program is performed in several steps as described in the toolkit documentation https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#cuda-compilation-trajectory
Compiling with the flag --keep
keeps all the intermediate files which can be searched for __assert_fail
.
File main.cpp1.ii
# 66 "/usr/include/assert.h" 3 4
extern "C" {
extern void __assert_fail (const char *__assertion, const char *__file,
unsigned int __line, const char *__function)
throw () __attribute__ ((__noreturn__));
...
}
# 169 "/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/common_functions.h"
extern "C"
{
# 197 "/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/common_functions.h"
extern __attribute__((host)) __attribute__((device)) __attribute__((cudart_builtin)) void __assert_fail(
const char *, const char *, unsigned int, const char *)
...
}
Note the attribute cudart_builtin
in the version injected by nvcc. With this attribute, there is no warning of redeclaration.
For your case, this program produces the warning, as you observed
extern void foo();
extern __host__ __device__ void foo();
int main(){
return 0;
}
However, the following compiles without warning.
extern void foo();
extern __host__ __device__ __attribute__((cudart_builtin))
void foo();
int main(){
return 0;
}
So, you could use this attribute. However, it may have unknown side effects, and foo is in fact no builtin function of the cuda runtime. You should not use this attribute on your functions.
CUDA 11.2 has been used.