Can I obtain the amount of allocated dynamic shared memory from within a kernel?

On the host side, I can save the amount of dynamic shared memory I intend to launch a kernel with, and use it. I can even pass that as an argument to the kernel. But - is there a way to get it directly from device code, without help from the host side? That is, have the code for a kernel determine, as it runs, how much dynamic shared memory it has available?

Solution

Yes, there's a special register holding that value, named %dynamic_smem_size. You can obtain this register's value in your CUDA C/C++ code by wrapping some inline PTX with a getter function:

__device__ unsigned dynamic_smem_size()
{
    unsigned ret; 
    asm volatile ("mov.u32 %0, %dynamic_smem_size;" : "=r"(ret));
    return ret;
}

You can similarly obtain the total size of allocated shared memory (static + dynamic) from the register %total_smem_size.

_{Note: reading a special register does cost a few cycles. If you can compute this value apriori, you'll save that read, plus you might squeeze something out of the optimzier knowing that value.}