Search code examples
cudagpu-shared-memory

Can I obtain the amount of allocated dynamic shared memory from within a kernel?


On the host side, I can save the amount of dynamic shared memory I intend to launch a kernel with, and use it. I can even pass that as an argument to the kernel. But - is there a way to get it directly from device code, without help from the host side? That is, have the code for a kernel determine, as it runs, how much dynamic shared memory it has available?


Solution

  • Yes, there's a special register holding that value, named %dynamic_smem_size. You can obtain this register's value in your CUDA C/C++ code by wrapping some inline PTX with a getter function:

    __device__ unsigned dynamic_smem_size()
    {
        unsigned ret; 
        asm volatile ("mov.u32 %0, %dynamic_smem_size;" : "=r"(ret));
        return ret;
    }
    

    You can similarly obtain the total size of allocated shared memory (static + dynamic) from the register %total_smem_size.

    Note: reading a special register does cost a few cycles. If you can compute this value apriori, you'll save that read, plus you might squeeze something out of the optimzier knowing that value.