Kernel lauch specifying the stream, but with default shared memory size

I need to specify the stream for a kernel launch in CUDA. The kernel uses some shared memory with its size defined in the kernel code.

static const int cBlockSize = 256;

__global__ fooKernel(void* param)
{
    __shared__ uint32_t words[cBlockSize/16];
    // implementation follows, using 2 bits of shared memory per thread
}

However, the shared memory size parameter goes before the stream parameter in a kernel launch expression. So how to tell CUDA to use shared memory size specified by the kernel code and ignore what's in the launch code?

fooKernel<<<N/cBlockSize, cBlockSize, /* What to put here? */, stream>>>(param);

Obviously, I would like to avoid code duplication putting (cBlockSize/16)*sizeof(uint32_t) there again. In reality the expression is more complex.

Solution

Statically allocated and dynamically allocated shared memory are treated separately, in many respects.

If you have no intention of using dynamically allocated shared memory, it is safe to pass the default value of zero as the third kernel launch parameter, regardless of any intentions you may have around the use of statically allocated shared memory.