Search code examples
cudagpu-shared-memory

Load structure in gpu shared memory


I am working with an array of structures (AoS), and each CUDA block will use the data of one structure (and only one) and do a lot of computation around it. In order for the program to work I would like to store the structure in shared memory.

I have tried to use the memcpy function like this:

struct LABEL_2D{
    int a;
    float * b[MAX];
};

Inside the kernel:

__shared__ struct LABEL_2D self_label;

if(threadIdx.x == 0){
    memcpy(&self_label,
           label+(blockIdx.x*sizeof(struct LABEL_2D)),
           sizeof(struct LABEL_2D));
}
__syncthreads();

But on execution I got the following error:

unspecified launch failure cudaGetLastError()

I am wondering if it is possible to store a structure into shared memory.


Solution

  • You are not supposed to copy the data with memcpy().

    You can assign the first thread as you did, to simply initialize a shared memory variable:

    struct LABEL_2D{
        int a;
        float * b[MAX];
    };
    
    
    __shared__ LABEL_2D self_label;
    
    if(threadIdx.x == 0){
        slef_label = label[blockIdx.x];
    
    }
     __syncthreads();
    

    EDIT: Deleted other workarounds because they were practically useless.