Search code examples
arrayscython

How to create array of (pointers to arrays of various lenth) and pass it from cython to python


I would like to create an array of pointers in cython. Each pointer should point to an array of varying size. And finally the structure should return to python. Is that possible or I am missing something? Not working code is posted bellow, but it may be question of proper definitions. It refuses to compile with the message: "Buffers with pointer types not yet supported".

Any help is appreciated!

# cython: language_level=3
from cpython cimport array as arr

def mainFunc():
    ArrayOfPointers = ArrayOfPointersGenerator("path")
    print(ArrayOfPointers[0][1]) ## 12
    print(ArrayOfPointers[1][2]) ## 23
    return 0

def ArrayOfPointersGenerator(path):
    cdef arr.array[unsigned int] Message1, Message2
    cdef unsigned int *PonterToMessage1, *PonterToMessage2
    cdef arr.array[unsigned int *] ArrayOfPointersToArrays

    Message1 = arr.clone(arr.array('I'), 2, False)
    Message1[0]=11
    Message1[1]=12

    Message2 = arr.clone(arr.array('I'), 3, False)
    Message2[0]=21
    Message2[1]=22
    Message2[2]=23

    PonterToMessage1 = <unsigned int*>Message1[0]
    PonterToMessage2 = <unsigned int*>Message2[0]

    ArrayOfPointersToArrays = arr.clone(arr.array('I'), 2, False)

    ArrayOfPointersToArrays[0]=PonterToMessage1
    ArrayOfPointersToArrays[1]=PonterToMessage2

    return ArrayOfPointersToArrays

Solution

  • First thing to bear in mind: if your code had compiled then it would have been a memory management disaster. As soon as you start to use pointers you are taking responsibility for ensuring that they point somewhere valid, and Cython can't handle that for you.

    In this case

    PonterToMessage1 = <unsigned int*>&Message1[0]
    PonterToMessage2 = <unsigned int*>&Message2[0]
    

    are intended to point to data inside Message1 and Message2 (note that I've added a & which wasn't present in your code though...). However Message1 and Message2 have nothing to link their lifetime to that of ArrayOfPointersToArrays and thus are probably deallocated at the end of the function.

    If you are not confident with this idea then I suggest not using pointers at all.


    With that in mind:

    • The simplest solution is probably a list of numpy arrays (not a list of lists). That way you still get the benefits of the data being efficiently packed in the arrays, but you also get the memory managed for free and the ability to access things in Python.
    • If you just want to pass this big block of data around through Python (and not access it) you could write a cdef class to wrap some C allocated pointers
      # note - untested code
      cdef class RaggedArrayWrapper:
          cdef int** data
          cdef int length
          def __cinit__(self, lengths: list):
               self.data = malloc(sizeof(int*)*len(lengths))
               self.length = len(lengths)
               for n, l in enumerate(lengths):
                   self.data[n] = malloc(sizeof(int)*l)
          def __dealloc__(self):
              for i in range(self.length):
                  free(self.data[n])
              free(self.data)
      
      The disadvantage of this approach is that you are responsible for your own memory management and keeping track of the bounds. But it isn't too hard to wrap them all in a constructor/destructor
    • Finally, you could use a Numpy array filled with uintptr_t (this has the Numpy dtype np.uintp). This is an unsigned integer large enough to hold a pointer. You'll need to cast your pointers to this integer type. If you choose to do this you are responsible for your own memory.