Search code examples
cudagpu-shared-memory

Reallocation of shared memory in CUDA


I have a question about the CUDA C++ programming. I am using shared memory. But I need larger shared memory. So I was trying to reuse the shared memory. My code is like:

__global__ void dist_calculation(...){

   ..........
   {
        //1st pass
       __shared__ short unsigned int shared_nodes[(number_of_nodes-1)*blocksize];

       ............

   }

   {
       //2nd pass
       __shared__ float s_distance_matrix[(number_of_nodes*(number_of_nodes-1))/2];

       ........
   }
}

Shared memory can't accommodate both shared_nodes and s_distance_matrix together. But it can accommodate each separately (I have tested). In the 2nd pass, the program can't recognize shared_nodes (as it is from 1st pass), but shows me an error that the shared memory doesn't have enough space. So it looks like, some space is still allocated for the shared_nodes variable. Is there any way to destroy that allocation (like cudaFree)? or any other suggestions?


Solution

  • Allocate a single untyped buffer large enough to accommodate either array and reinterpret the array for each pass of your algorithm:

    __global__ void dist_calculation(...)
    {
       const unsigned int num_bytes1 = sizeof(unsigned short) * (number_of_nodes-1) * block_size;
    
       const unsigned int num_bytes2 = sizeof(float) * (number_of_nodes) * (number_of_nodes-1)) / 2;
    
       const unsigned int num_shared_bytes = num_bytes1 > num_bytes2? num_bytes1: num_bytes2;
    
       __shared__ char smem[num_shared_bytes]; 
    
       unsigned short *shared_nodes = reinterpret_cast<unsigned int*>(smem);
       first_pass(shared_nodes);
    
       float *distance_matrix = reinterpret_cast<unsigned int*>(smem);
       second_pass(distance_matrix);    
    }