cuda dynamic-memory-allocation access-violation pointer-to-pointer

CUDA: pointer to pointer memory access

I can't figure out what is causing the issue. I get "access violation writing location" error in the last line. Am I not correctly allocating the memory?

    typedef struct {
    doubleXYZW cen_sum; //struct with 4 doubles
    double STS[6];
    XYZW *Points;// //struct with 4 floats
}BUNDLE;

BUNDLE *cpu_data = NULL;
size_t bundle_size = NUM_POINTS * sizeof(XYZW) + sizeof(doubleXYZW) + 6*sizeof(double);
HANDLE_ERROR(cudaMallocHost((BUNDLE**)&cpu_data, bundle_size));
//error in the next line
cpu_data->Points[0].x = 0; //x is the first element in the XYZW struct

Solution

You have 2 allocations that must be done, and you are only performing one of them.

You are allocating some storage for the cpu_data pointer, but you have not allocated any storage for the Points pointer. Therefore when you dereference Points:

cpu_data->Points[0].x = 0;
         ^      ^
         |      this dereferences the Points pointer (NOT allocated!)
         |
        this dereferences the cpu_data pointer (allocated)

you are dereferencing a pointer that you have not allocated, so it is invalid. Attempting to access something that way will generate an invalid access.

You have (at least) two options to fix it:

after you have allocated space for cpu_points, you can perform another cudaMallocHost allocation on cpu_points->Points

If you know the size of the Points array (it seems like you do - NUM_POINTS) then you could just statically allocate for it:

typedef struct {
doubleXYZW cen_sum; //struct with 4 doubles
double STS[6];
XYZW Points[NUM_POINTS];// //struct with 4 floats
}BUNDLE;

Note that your bundle_size calculation is crafted in such a way that the 2nd method is suggested. If you go with the first method, your bundle_size calculation is incorrect. In any event, with either method, it's easier just to compute bundle_size as sizeof(BUNDLE).

To be clear, there is nothing CUDA-specific here (the error would be present e.g. if you used malloc instead of cudaMallocHost). The problem is rooted in basic C understanding, not CUDA.