Search code examples
pycuda

How do I feed a 2-dimensional array into a kernel with pycuda?


I have created a numpy array of float32s with shape (64, 128), and I want to send it to the GPU. How do I do that? What arguments should my kernel function accept? float** myArray?

I have tried directly sending the array as it is to the GPU, but pycuda complains that objects are being accessed...


Solution

  • Two dimensional arrays in numpy/PyCUDA are stored in pitched linear memory in row major order by default. So you only need to have a kernel something like this:

    __global__
    void kernel(float* a, int lda, ...)
    {
        int r0 = threadIdx.y + blockDim.y * blockIdx.y;
        int r1 = threadIdx.x + blockDim.x * blockIdx.x;
    
        float val = a[r0 + r1*lda];
    
        ....
    }
    

    to access a numpy ndarray or PyCUDA gpuarray passed by reference to the kernel from Python.