Search code examples
c++cmakecompilationcudacuda-gdb

Questions about CUDA macro __CUDA_ARCH__


I have a simple cuda code in ttt.cu

#include <iostream>
__global__ void example(){
   printf("__CUDA_ARCH__: %d \n", __CUDA_ARCH__);
}
int main(){
example<<<1,1>>>();
}

with CMakeLists.txt:

cmake_minimum_required(VERSION 3.18)
project(Hello)
find_package(CUDA REQUIRED)

cuda_add_executable(sss ttt.cu)

Then I got the error: identifier "__CUDA_ARCH__" is undefined. I would like to know why does this happen and what should I do for making the __CUDA_ARCH__ valid? And can we use valid __CUDA_ARCH__ in host code within a header .h file?

Update:

I intended to use the following cmake for generating a 750 cuda arch, however, this always results in a __CUDA_ARCH__ = 300 (2080 ti with cuda 10.1). I tried both set_property and target_compile_options, which all failed.

cmake_minimum_required(VERSION 3.18)
project(Hello)
find_package(CUDA REQUIRED)
cuda_add_executable(oounne ttt.cu)
set_property(TARGET oounne PROPERTY CUDA_ARCHITECTURES 75)
#target_compile_options(oounne PRIVATE  $<$<COMPILE_LANGUAGE:CUDA>:-gencode 
arch=compute_75,code=sm_75>)


Solution

  • __CUDA_ARCH__ is a compiler macro.

    can we use valid __CUDA_ARCH__ in host code

    No, it is intended to be used in device code only:

    The host code (the non-GPU code) must not depend on it.

    You cannot print a compiler macro the way you are imagining. It is not an ordinary numerical variable defined in C++. You could do something like this but that would print at compile-time, not at run-time.

    To print at run-time, you could do something like this:

    $ cat t2.cu
    #include <cstdio>
    #define STR_HELPER(x) #x
    #define STR(x) STR_HELPER(x)
    
    __device__ void print_arch(){
      const char my_compile_time_arch[] = STR(__CUDA_ARCH__);
      printf("__CUDA_ARCH__: %s\n", my_compile_time_arch);
    }
    __global__ void example()
    {
       print_arch();
    }
    
    int main(){
    
    example<<<1,1>>>();
    cudaDeviceSynchronize();
    }
    $ nvcc -o t2 t2.cu
    
    $ ./t2
    __CUDA_ARCH__: 520
    $
    

    Note that there are quite a few questions here on the cuda tag discussing __CUDA_ARCH__, you may wish to review some of them.