nvcc: get device compute capability in runtime

I have an application that uses the GPU and that runs on different machines. I currently manually specify to NVCC the parameters -arch=compute_xx -code=sm_xx, according to the GPU model installed on the machine I am running on.

I want to write an automation that will be able to extract those values from the host machine, so that I will not need to specify them manually. Is there a way to really do that automatically?

Solution

With the C++ CUDA runtime API, you can do as follows to find the major and minor CUDA compute capability version:

cudaDeviceProp deviceProp;
cudaGetDeviceProperties(&deviceProp, dev);
std::printf("%d.%d\n", deviceProp.major, deviceProp.minor);

This will print, for example "6.1" on a Pascal card.

If you have the CUDA demo suite in your installation, in the extras/demo_suite directory, the deviceQuery executable in there uses this API to fetch the compute capability version:

$ /path/to/cuda/extras/demo_suite/deviceQuery | grep 'CUDA Capability'
  CUDA Capability Major/Minor version number:    6.1
  CUDA Capability Major/Minor version number:    6.1

However, when building CUDA programs intended to work on a number of possible GPUs, the best way of handling this is by building in the PTX / binary device code for each architecture / compute capability you want like so:

nvcc x.cu \
    --generate-code arch=compute_50,code=sm_50 \
    --generate-code arch=compute_50,code=sm_52 \
    --generate-code arch=compute_53,code=sm_53

This is further described in the NVCC docs. Also note that adding an entry with code=compute_XX as well as code=sm_XX will include the portable PTX code for the device code in your program and will enable JIT compilation and support for newer architectures you haven't explicitly included in your compilation. You may find that all you need is something like -arch=compute_50 -code=compute_50 to work with all Maxwell cards and newer, but without necessarily the best code for some newer GPUs.