I have a question about the CUDA C++ programming. I am using shared memory. But I need larger shared memory. So I was trying to reuse the shared memory. My code is like:
__global__ void dist_calculation(...){
..........
{
//1st pass
__shared__ short unsigned int shared_nodes[(number_of_nodes-1)*blocksize];
............
}
{
//2nd pass
__shared__ float s_distance_matrix[(number_of_nodes*(number_of_nodes-1))/2];
........
}
}
Shared memory can't accommodate both shared_nodes
and s_distance_matrix
together. But it can accommodate each separately (I have tested). In the 2nd pass, the program can't recognize shared_nodes
(as it is from 1st pass), but shows me an error that the shared memory doesn't have enough space. So it looks like, some space is still allocated for the shared_nodes
variable. Is there any way to destroy that allocation (like cudaFree
)? or any other suggestions?
Allocate a single untyped buffer large enough to accommodate either array and reinterpret the array for each pass of your algorithm:
__global__ void dist_calculation(...)
{
const unsigned int num_bytes1 = sizeof(unsigned short) * (number_of_nodes-1) * block_size;
const unsigned int num_bytes2 = sizeof(float) * (number_of_nodes) * (number_of_nodes-1)) / 2;
const unsigned int num_shared_bytes = num_bytes1 > num_bytes2? num_bytes1: num_bytes2;
__shared__ char smem[num_shared_bytes];
unsigned short *shared_nodes = reinterpret_cast<unsigned int*>(smem);
first_pass(shared_nodes);
float *distance_matrix = reinterpret_cast<unsigned int*>(smem);
second_pass(distance_matrix);
}