How to have a list of memory views in Cython?

My function takes in a list of differently sized numpy arrays:

def function1(list list_of_numpy_arrays):

Right now I am doing:

cdef int[:] a_view = list_of_numpy_arrays[index]

The problem is I have to index the list a large number of times so the overhead greatly increases the time (10x). I am looking for something like cdef int[:] a[5] where I can have an array of memory views so I can avoid the overhead of indexing python lists.

I am also able to pass in a list of lists if there is a solution for that.

def function2(list list_of_lists):

Solution

What you're after isn't really possible in Cython. If you want something that performs well I'd probably create a C struct that contains the relevant information from the memoryview and then use that instead. This isn't a very elegant solution but it will give similar performance to using memoryviews; I wouldn't recommend making it a common pattern but if you have a one-off problem where your data requires in then it's OK.

cdef struct FakeMemoryView:
    int* data
    int stride
    int length

If you were prepared to force C contiguous memorviews (int[::1]) then you could ditch stride since it would be known to be one. Data can be indexed using var.data[i*var.stride]. At the start of your function you loop through your Python list to create an array of these FakeMemoryViews, then from that point on you just use this array:

def function1(list list_of_numpy_arrays):
    assert len(list_of_numpy_arrays) == 5

    cdef FakeMemoryView new_list[5]

    # initialize the list
    cdef int[:] mview
    for i in range(5):
        mview = list_of_numpy_arrays[i]
        new_list[i].data = &mview[0]
        new_list[i].stride = mview.strides[0]
        new_list[i].length = mview.shape[0]

    # example access - zero the first lot of data
    for i in range(new_list[0].length):
        new_list[0].data[i*new_list[0].stride] = 0

If you don't know the length of the list in advance then you need to handle the memory for it yourself with malloc and free.

This solution does not handle reference-counting the Numpy arrays - therefore you should not allow the Numpy arrays to be deallocated while holding FakeMemoryViews. Don't store your array for more than a single function call, and don't start dropping arrays from the input list.