Search code examples
cudanvccgpu-shared-memory

Registers and shared memory depending on compiling compute capability?


when I compile with nvcc -arch=sm_13 I get:

ptxas info    : Used 29 registers, 28+16 bytes smem, 7200 bytes cmem[0], 8 bytes cmem[1] 

when I use nvcc -arch=sm_20 I get:

ptxas info    : Used 34 registers, 60 bytes cmem[0], 7200 bytes cmem[2], 4 bytes cmem[16] 

I thought all the kernel parameters are passed to shared memory but for sm_20, it doesn't seem so...?! Perhaps they are also passed into registers? The head of my function looks like the following:

__global__ void func(double *, double , double, int)

Thanks so far!


Solution

  • In compute capability 2.x devices, arguments to kernels are stored in constant memory. The register difference is probably down to differences in the code generated for math library functions between versions. Are there things like transcendental functions or sqrt in the kernel?