Search code examples
pythoncythontyped-memory-viewspep3118

Cython - Memoryview of a dynamic 2D C++Array


The Goal: Get a Memoryview from a 2D C++ char array using Cython.

A little background:

I have a native C++ library which generates some data and returns it via a char** to the Cython world. The array is initialized and operated in the library about like this:

struct Result_buffer{
    char** data_pointer;
    int length = 0;

    Result_buffer( int row_capacity) {
        data_pointer; = new char*[row_capacity];
        return arr;
    }

    // the actual data is appended row by row
    void append_row(char* row_data) {
         data_pointer[length] = row_data;
         length++;
    }     
}

So we basically get an array of nested sub-arrays.

Side Notes:
- each row has the same count of columns
- rows can share memory, i.e. point to the same row_data

The goal is to use this array with a memoryview preferrably without expensive memory copying.


First Approach (not working):

Using Cython arrays and memoryviews:

Here's the .pyx-file which should consume the generated data

from cython cimport view
cimport numpy as np
import numpy as np

[...]

def raw_data_to_numpy(self):

    # Dimensions of the source array
    cdef int ROWS = self._row_count
    cdef int COLS = self._col_count

    # This is the array from the C++ library and is created by 'create_buffer()'
    cdef char** raw_data_pointer = self._raw_data

    # It only works with a pointer to the first nested array
    cdef char* pointer_to_0 = raw_data_pointer[0]

    # Now create a 2D Cython array
    cdef view.array cy_array = <char[:ROWS, :COLS]> pointer_to_0

    # With this we can finally create our NumPy array:
    return np.asarray(cy_array)

This is actually compiles fine and runs without crashing, but the result isn't quite what I expected. If I print out the values of the NumPy array I get this:

000: [1, 2, 3, 4, 5, 6, 7, 8, 9]
001: [1, 0, 0, 0, 0, 0, 0, 113, 6]
002: [32, 32, 32, 32, 96, 96, 91, 91, 97]
[...]

it turns out that the first row was mapped correctly, but the other rows look rather like uninitialized memory. So there's probably a mismatch with the memory-layout of char** and the default mode of 2D memoryviews.


Edit #1: What I've learned from my other question is that the built-in cython arrays don't support indirect memory layouts so I have to create a cython-wrapper for the unsigned char** which exposes the buffer-protocol


Solution

  • The Solution:

    Manually implement the buffer-protocol:

    The wrapper class which wraps the unsigned char** and implements the buffer-protocol (Indirect2DArray.pyx):

    cdef class Indirect2DArray:
        cdef Py_ssize_t len
        cdef unsigned char** raw_data
        cdef ndim
        cdef Py_ssize_t item_size
        cdef Py_ssize_t strides[2]
        cdef Py_ssize_t shape[2]
        cdef Py_ssize_t suboffsets[2]
    
    
        def __cinit__(self,int nrows,int ncols):
            self.ndim = 2
            self.len = nrows * ncols
            self.item_size = sizeof(unsigned char)
    
            self.shape[0] = nrows
            self.shape[1] = ncols
    
            self.strides[0] = sizeof(void*)
            self.strides[1] = sizeof(unsigned char)
    
            self.suboffsets[0] = 0
            self.suboffsets[1] = -1
    
    
        cdef set_raw_data(self, unsigned char** raw_data):
            self.raw_data = raw_data        
    
        def __getbuffer__(self,Py_buffer * buffer, int flags):
            if self.raw_data is NULL:
                raise Exception("raw_data was NULL when calling __getbuffer__ Use set_raw_data(...) before the buffer is requested!")
    
            buffer.buf = <void*> self.raw_data
            buffer.obj = self
            buffer.ndim = self.ndim
            buffer.len = self.len
            buffer.itemsize = self.item_size
            buffer.shape = self.shape
            buffer.strides = self.strides
            buffer.suboffsets = self.suboffsets
            buffer.format = "B" # unsigbed bytes
    
    
        def __releasebuffer__(self, Py_buffer * buffer):
            print("CALL TO __releasebuffer__")
    

    Note: I wasn't able to pass the raw pointer via the wrapper's constructor so I had to use a seperate cdef-function to set set the pointer

    Here's its usage:

    def test_wrapper(self):
        cdef nrows= 10000
        cdef ncols = 81    
    
        cdef unsigned char** raw_pointer = self.raw_data
        wrapper = Indirect2DArray(nrows,ncols)    
        wrapper.set_raw_data(raw_pointer)
    
        # now create the memoryview:
        cdef unsigned char[::view.indirect_contiguous, ::1] view = wrapper
    
        # print some slices 
        print(list(view[0,0:30]))
        print(list(view[1,0:30]))
        print(list(view[2,0:30]))
    

    producing the following output:

    [1, 2, 3, 4, 5, 6, 7, 8, 9, 4, 5, 6, 7, 8, 9, 1, 2, 3, 7, 8, 9, 1, 2, 3, 4, 5, 6, 2, 1, 4]
    [2, 1, 3, 4, 5, 6, 7, 8, 9, 4, 5, 6, 7, 8, 9, 1, 2, 3, 7, 8, 9, 1, 2, 3, 4, 5, 6, 1, 2, 4]
    [3, 1, 2, 4, 5, 6, 7, 8, 9, 4, 5, 6, 7, 8, 9, 1, 2, 3, 7, 8, 9, 1, 2, 3, 4, 5, 6, 1, 2, 3]
    

    This is exactly what I expected. Thanks to all who helped me