I am working with an array of structures (AoS), and each CUDA block will use the data of one structure (and only one) and do a lot of computation around it. In order for the program to work I would like to store the structure in shared memory.
I have tried to use the memcpy
function like this:
struct LABEL_2D{
int a;
float * b[MAX];
};
Inside the kernel:
__shared__ struct LABEL_2D self_label;
if(threadIdx.x == 0){
memcpy(&self_label,
label+(blockIdx.x*sizeof(struct LABEL_2D)),
sizeof(struct LABEL_2D));
}
__syncthreads();
But on execution I got the following error:
unspecified launch failure cudaGetLastError()
I am wondering if it is possible to store a structure into shared memory.
You are not supposed to copy the data with memcpy().
You can assign the first thread as you did, to simply initialize a shared memory variable:
struct LABEL_2D{
int a;
float * b[MAX];
};
__shared__ LABEL_2D self_label;
if(threadIdx.x == 0){
slef_label = label[blockIdx.x];
}
__syncthreads();
EDIT: Deleted other workarounds because they were practically useless.