pycuda: Memory allocation in gpu for a list

I want to run a simple pycuda program to update a list on the gpu. Following is my list. dm_count = [[0], [1, 2], [3, 4, 5], [6, 7, 8, 9]]. I have this list as the input and expected to update the input list in parallel. It throws an exception when I tries to allocate memory in the gpu using mem_alloc().

It give the Attribute Error saying the "'list' object has no attribute 'nbytes'". When I search for answers some says to convert the list in the form of an array and nbytes otherwise cannot be applied. It seems to support for the arrays in format [[1,1],[1,1],[2,4]] only. But I doesn't want to change the list. What is the way to allocate memory in gpu while keeping the list in its original format?

I am not aware whether the memcpy_dtoh() also work correctly. How can I correct this program to yield the expected outcome?

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy

dm_count = [[0], [1, 2], [3, 4, 5], [6, 7, 8, 9]]
length = len(dm_count)

mod = SourceModule("""
__global__ void UpdateMatrix(int **dm_count, int length)
    {
       int row = threadIdx.x + blockIdx.x*blockDim.x;
       int col = threadIdx.y + blockIdx.y*blockDim.y;
       if( (row < length) && (col< row)){
            dm_count[row][col] = 0 ; 
       }
    }
        """)


dm_gpu = cuda.mem_alloc(dm_count.nbytes)
cuda.memcpy_htod(dm_gpu, dm_count)
func = mod.get_function("updateMatrix")
func(dm_gpu, block=(length, length, 1))
result = numpy.empty_like(dm_count)
cuda.memcpy_dtoh(result, dm_gpu)
print(result)

Expected Result: result = [[0], [0, 2], [0, 0, 5], [0, 0, 0, 9]]

Error Message: Traceback (most recent call last): File "test_pycuda.py", line 55, in dm_gpu = cuda.mem_alloc(dm_count.nbytes) AttributeError: 'list' object has no attribute 'nbytes'

Solution

I want to run a simple pycuda program to update a list on the gpu

It is not possible to manipulate a python list in PyCUDA. In general, PyCUDA can only deal with numpy arrays with a limited set of dtypes, and similar types which support the Python buffer protocol.

As a result, you could potentially re-write your code to use a numpy array of a suitable dtype as input to the kernel, although you would have to devise a representation of the jagged array which would be compatible with a contiguous numpy array. You would then need to write the CUDA kernel to use the format you devise (Note your current kernel is broken in a number of ways which mean that is would not work even if the list was accepted as an inout by PyCUDA).