Search code examples
numpycythonboost-python

What are the relative advantages of extending NumPy in Cython vs Boost.Python?


I need to speed up some algorithms working on NumPy arrays. They will use std::vector and some of the more advanced STL data structures.

I've narrowed my choices down to Cython (which now wraps most STL containers) and Boost.Python (which now has built-in support for NumPy).

I know from my experience as a programmer that sometimes it takes months of working with a framework to uncover its hidden issues (because they are rarely used as talking points by its disciples), so your help could potentially save me a lot of time.

What are the relative advantages and disadvantages of extending NumPy in Cython vs Boost.Python?


Solution

  • This is a very incomplete answer that only really covers a couple of small parts of it (I'll edit it if I think of anything more):


    Boost doesn't look to implement operator[] specifically for numpy arrays. This means that operator[] will come from the base object class (that ndarray inherits), which will mean the call will go through the Python mechanisms to __getitem__ and so indexing will be slow (close to Python speed). If you want to do indexing at speed you'll have to do pointer arithmetic yourself:

    // rough gist - untested:
    
    // i,j,k are your indices
    
    double* data = reinterpret_cast<double*>(array.get_data());
    // in reality you'd check the dtype - the data may not be a double...
    
    double data_element = array.strides(0)*i + array.strides(1)*j +array.strides(2)*k;
    

    In contrast Cython has efficient indexing of numpy arrays built in automatically.


    Cython isn't great at things like std::vector (although it isn't absolutely terrible - you can usually trick it into doing what you want). One notable limitation is that all cdefs have to go at the start of the function so C++ classes with be default constructed there, and then assigned to/manipulated later (which can be somewhat inefficient). For anything beyond simple uses you do not want to be manipulating C++ types in Cython (instead it's better to write the code in C++ then call it from Cython).

    A second limitation is that it struggles with non-class templates. One common example is std::array, which is templated with a number. Depending on your planned code this may or may not be an issue.