Search code examples
c++cudamemcpy

How to copy data from unsigned int to ulong4 in CUDA


.h file:

#define VECTOR_SIZE 1024   

.cpp file:

int main ()
{
    unsigned int* A;
    A = new unsigned int [VECTOR_SIZE];

    CopyToDevice (A);
}

.cu file:

void CopyToDevice (unsigned int *A)
{
    ulong4 *UA
    unsigned int VectorSizeUlong4 = VECTOR_SIZE / 4;
    unsigned int VectorSizeBytesUlong4 = VectorSizeUlong4 * sizeof(ulong4);

    cudaMalloc( (void**)&UA, VectorSizeBytesUlong4 );

    // how to use cudaMemcpy to copy data from A to UA?

    // I tried to do the following but it gave access violation error:
    for (int i=0; i<VectorSizeUlong4; ++i)
    {
        UA[i].x = A[i*4 + 0];
        UA[i].y = A[i*4 + 1];
        UA[i].z = A[i*4 + 2];
        UA[i].w = A[i*4 + 3];
    }
    // I also tried to copy *A to device and then work on it instead going back to CPU to access *A every time but this did not work again
}

enter image description here


Solution

  • The CUDA ulong4 is a 16 byte aligned structure defined as

    struct __builtin_align__(16) ulong4
    {
      unsigned long int x, y, z, w;
    };
    

    this means that the stream of four consecutive 32 bit unsigned source integers you want to use to populate a stream of ulong4 are the same size. The simplest solution is contained right in the text on the image you posted - just cast (either implicitly or explicitly) the unsigned int pointer to a ulong4 pointer, use cudaMemcpydirectly on the host and device memory, and pass the resulting device pointer to whatever kernel function you have that requires a ulong4 input. Your device transfer function could look something like:

    ulong4* CopyToDevice (unsigned int* A)
    {
        ulong4 *UA, *UA_h;
        size_t VectorSizeUlong4 = VECTOR_SIZE / 4;
        size_t VectorSizeBytesUlong4 = VectorSizeUlong4 * sizeof(ulong4);
    
        cudaMalloc( (void**)&UA, VectorSizeBytesUlong4);
        UA_h = reinterpret_cast<ulong4*>(A); // not necessary but increases transparency
        cudaMemcpy(UA, UA_h, VectorSizeBytesUlong4);
    
        return UA;   
    }
    

    [Usual disclaimer: written in browser, not tested or compiled, use at own risk]