Search code examples
c++cudagpu-shared-memory

Correct kernel call in case of using dynamic shared memory allocation


I didn't find any information about how to allocate static and dynamic shared memory in the same kernel, or lets ask more precisely:

How to call a kernel where the amount of shared memory that needs to be allocated is just partly known at compilation time?

Referring to allocating shared memory for example, it becomes pretty obvious how to do it for dynamic allocation.

But lets assume I have the following kernel:

__global__ void MyKernel(int Float4ArrSize, int FloatArrSize)
{
  __shared__ float Arr1[256];
  __shared__ char  Arr2[256];
  extern __shared_ float DynamArr[];
  float4* DynamArr1 = (float4*) DynamArr;
  float* DynamArr = (float*) &DynamArr1[Float4ArrSize];
  
  // do something
}

Kernel Launch:

int SharedMemorySize = Float4ArrSize + FloatArrSize;
    
SubstractKernel<<< numBlocks, threadsPerBlock, SharedMemorySize, stream>>>(Float4ArrSize, FloatArrSize)

I actually wasn't able to figure out how the compiler is linking the size of shared memory only to the part I want to allocate dynamically. Or does the parameter SharedMemorySize represent the total amount of shared memory per block, so I need to calculate in the size of static shared memory (int SharedMemorySize = Float4ArrSize + FloatArrSize + 256*sizeof(float)+ 256*sizeof(char))?

Please enlighten me or just simply point to some code snippets. Thanks a lot in advance.

cheers greg


Solution

  • Citing programing guide, SharedMemorySize specifies the number of bytes in shared memory that is dynamically allocated per block for this call in addition to the statically allocated memory; this dynamically allocated memory is used by any of the variables declared as an external array. SharedMemorySize is an optional argument which defaults to 0.

    So if I understand what you want to do, it should probably look like

    extern __shared_ float DynamArr[];
    float*  DynamArr1 = DynamArr;
    float4* DynamArr2 = (float4*) &DynamArr[DynamArr1_size];
    

    Be aware, I didn't test it.

    Here is very useful post.