Search code examples
cudagpgpunvcc

Can I get CUDA Compute capability (version) in compile time by #define?


How can I get CUDA Compute capability (version) in compile time by #define? For example, if I use __ballot and compile with

nvcc -c -gencode arch=compute_20,code=sm_20  \
        -gencode arch=compute_13,code=sm_13
        source.cu

can I get version of compute capability in my code by #define for choose the branch of code with __ballot and without?


Solution

  • Yes. First, it's best to understand what happens when you use -gencode. NVCC will compile your input device code multiple times, once for each device target architecture. So in your example, NVCC will run compilation stage 1 once for compute_20 and once for compute_13.

    When nvcc compiles a .cu file, it defines two preprocessor macros, __CUDACC__ and __CUDA_ARCH__. __CUDACC__ does not have a value, it is simply defined if cudacc is the compiler, and not defined if it isn't.

    __CUDA_ARCH__ is defined to an integer value representing the SM version being compiled.

    • 100 = compute_10
    • 110 = compute_11
    • 200 = compute_20

    etc. To quote the NVCC documentation included with the CUDA Toolkit:

    The architecture identification macro __CUDA_ARCH__ is assigned a three-digit value string xy0 (ending in a literal 0) during each nvcc compilation stage 1 that compiles for compute_xy. This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. The host code (the non-GPU code) must not depend on it.

    So, in your case where you want to use __ballot(), you can do this:

    ....
    #if __CUDA_ARCH__ >= 200
        int b = __ballot();
        int p = popc(b & lanemask);
    #else
        // do something else for earlier architectures
    #endif