Shared memory and streams when launching kernel

I'm new to CUDA and working on a personal project. I know that, if you want to specify the amount of shared memory at launch:

kernel<<<grid_size,block_size,shared_mem_size>>>(parameters);

On the other hand, if I want to put a kernel into a stream:

kernel<<<grid_size,block_size,0,stream_being_used>>>(parameters);

I don't understand why the third parameter is 0 in the case of stream? (I'm getting it from chapter 10 in "CUDA by examples" by Sanders and Kandrot).

If I want to specify the shared memory at launch AND put it into a stream, how do I do that correctly? In other words, what should the parameters in between <<<...>>> look like?

Solution

The only reason that 0 is there is because in that particular example, no dynamic shared memory is required.

Shared memory can be allocated either statically (without using extern in which case the size is explicitly stated in the declaration) or dynamically(using extern, and the size shows up as the 3rd parameter in the kernel launch configuration).

The kernel launch configuration parameters <<<...>>> always show up in the same order:

the grid dimensions
the threadblock dimensions
the size of dynamically allocated shared memory (in bytes)
the stream to launch the kernel in

1 and 2 are mandatory, 3 and 4 are optional. But if you need to specify parameter 4 (the stream) you must supply parameter 3, even if it is zero.

so the correct sequence is:

kernel_name<<<grid_dim, threadblock_dim, dynamic_shared_memory_size, stream>>>(...);

You can read more about it in the documentation