I'm compiling a CUDA program using nvcc
with options -arch=20 -code=20
for a GeForce 310 GPU having compute capability 1.2. The program seems to run normally as follows.
wangli@wangli-desktop:~/wangliC2050/1D-EncodeV6.1$ make
nvcc -O --ptxas-options=-v 1D-EncodeV6.1.cu -o 1D-EncodeV6.1 -I../../NVIDIA_GPU_Computing_SDK/C/common/inc -I../../NVIDIA_GPU_Computing_SDK/shared/inc -arch=compute_20 -code=sm_20
ptxas info : Compiling entry function '_Z6EncodePhPjS0_S_S_' for 'sm_20'
ptxas info : Function properties for _Z6EncodePhPjS0_S_S_
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 14 registers, 52 bytes cmem[0]
wangli@wangli-desktop:~/wangliC2050/1D-EncodeV6.1$ ./1D-EncodeV6.1
########################### Encoding start (loopCount=10)#######################
#p n size averageTime(s) averageThroughput(MB/s) errorRate(0~1)
#================= Encode on GPU v6.1 ===============
4 4 4 0.000294 0.051837 100.000000
#################### Encoding stop #########################
So, I wonder:
nvcc
options -arch=compute_20 -code=sm_20
which do not match the compute capability 1.2 of the card?-arch
option will differ from that of the -code
option?Thanks.
A CUDA executable typically contains 2 types of program data: SASS code which is basically GPU machine code, and PTX which is an intermediate code (although it's pretty close to machine code). As long as PTX code is present in the executable, then if the driver decides that a proper SASS binary is not available for the GPU that the code will actually run on, it will do a "JIT-compile" step at application launch, to create the necessary binary code appropriate for the device in question, using the PTX code in the application package.
This is what is happening in your case.
If arch != code, then you're creating device code that architecturally conforms to the arch type, but is compiled to use machine level instructions that are associated with the code type. For example, if I compile for arch = 1.2 and code = 2.0, I cannot use double
types (they will be demoted to float
, because double
is not supported in a 1.2 architecture) but the SASS machine code generated will be ready to execute on a cc 2.0 device, and will not require a JIT-compile step for that kind of device.
The NVCC manual has more information particularly the section on steering code generation.