Load structure in gpu shared memory

I am working with an array of structures (AoS), and each CUDA block will use the data of one structure (and only one) and do a lot of computation around it. In order for the program to work I would like to store the structure in shared memory.

I have tried to use the memcpy function like this:

struct LABEL_2D{
    int a;
    float * b[MAX];
};

Inside the kernel:

__shared__ struct LABEL_2D self_label;

if(threadIdx.x == 0){
    memcpy(&self_label,
           label+(blockIdx.x*sizeof(struct LABEL_2D)),
           sizeof(struct LABEL_2D));
}
__syncthreads();

But on execution I got the following error:

unspecified launch failure cudaGetLastError()

I am wondering if it is possible to store a structure into shared memory.

Solution

You are not supposed to copy the data with memcpy().

You can assign the first thread as you did, to simply initialize a shared memory variable:

struct LABEL_2D{
    int a;
    float * b[MAX];
};


__shared__ LABEL_2D self_label;

if(threadIdx.x == 0){
    slef_label = label[blockIdx.x];

}
 __syncthreads();

EDIT: Deleted other workarounds because they were practically useless.