Search code examples
cudanvcc

Why the compiled binary gets smaller when -gencode used?


Why the compiled binary gets smaller when -gencode used?

My GPU's capability is 3.0.

NVCC option:

Without -gencode option:

1,780,520 bytes

-gencode=arch=compute_30,code=sm_30:

1,719,080 bytes, gets smaller

-gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_61,code=sm_61:

1,780,800 bytes


Solution

  • Nvidia documentation tells that:

    Example:

    nvcc x.cu
    

    is equivalent to:

    nvcc x.cu --gpu-architecture=compute_30 --gpu-code=sm_30,compute_30
    

    but in your case:

    nvcc x.cu -gencode=arch=compute_30,code=sm_30
    

    is equivalent to:

    nvcc x.cu --gpu-architecture=compute_30 --gpu-code=sm_30
    

    which does not include the PTX code for the virtual architecture (such as compute_30)