Search code examples

Allocate 2D Array on Device Memory in CUDA

How do I allocate and transfer(to and from Host) 2D arrays in device memory in Cuda?


  • I found a solution to this problem. I didn't have to flatten the array.

    The inbuilt cudaMallocPitch() function did the job. And I could transfer the array to and from device using cudaMemcpy2D() function.

    For example

    cudaMallocPitch((void**) &array, &pitch, a*sizeof(float), b);

    This creates a 2D array of size a*b with the pitch as passed in as parameter.

    The following code creates a 2D array and loops over the elements. It compiles readily, you may use it.

    #define height 50
    #define width 50
    // Device code
    __global__ void kernel(float* devPtr, int pitch)
        for (int r = 0; r < height; ++r) {
            float* row = (float*)((char*)devPtr + r * pitch);
            for (int c = 0; c < width; ++c) {
                 float element = row[c];
    //Host Code
    int main()
        float* devPtr;
        size_t pitch;
        cudaMallocPitch((void**)&devPtr, &pitch, width * sizeof(float), height);
        kernel<<<100, 512>>>(devPtr, pitch);
        return 0;