Can I get CUDA Compute capability (version) in compile time by #define?

How can I get CUDA Compute capability (version) in compile time by #define? For example, if I use __ballot and compile with

nvcc -c -gencode arch=compute_20,code=sm_20  \
        -gencode arch=compute_13,code=sm_13
        source.cu

can I get version of compute capability in my code by #define for choose the branch of code with __ballot and without?

Solution

Yes. First, it's best to understand what happens when you use -gencode. NVCC will compile your input device code multiple times, once for each device target architecture. So in your example, NVCC will run compilation stage 1 once for compute_20 and once for compute_13.

When nvcc compiles a .cu file, it defines two preprocessor macros, __CUDACC__ and __CUDA_ARCH__. __CUDACC__ does not have a value, it is simply defined if cudacc is the compiler, and not defined if it isn't.

__CUDA_ARCH__ is defined to an integer value representing the SM version being compiled.

100 = compute_10
110 = compute_11
200 = compute_20

etc. To quote the NVCC documentation included with the CUDA Toolkit:

The architecture identification macro __CUDA_ARCH__ is assigned a three-digit value string xy0 (ending in a literal 0) during each nvcc compilation stage 1 that compiles for compute_xy. This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. The host code (the non-GPU code) must not depend on it.

So, in your case where you want to use __ballot(), you can do this:

....
#if __CUDA_ARCH__ >= 200
    int b = __ballot();
    int p = popc(b & lanemask);
#else
    // do something else for earlier architectures
#endif