Search code examples
numpypython-c-api

Reading many values from numpy C API


I'm trying to read many values (in sequence) from a large numpy array using the C API. I'd like a more efficient way than seperately using boost::python::extract(...) on each value. Something like getting a pointer to the first value and then just incrementing that pointer.

I've read through the numpy API docs and I can see that it's possible but am none the wiser on how to actually accomplish this. Can anyone point me to an example?


Solution

  • The Boost::Python API does not allow you to do that directly. You can use the Numpy C API to do it. Getting access to the underlying PyObject* can be done using the ptr() method of boost::python::object. Access to the data can be obtained using PyArray_DATA(PyObject*).

    Here's an example that multiplies a 2d Numpy array by a number. I've had some trouble finding out how to compile this on Mac OS (i.e., where the Numpy headers are), so I'll add this here: running

    import numpy; print numpy.get_include()
    

    in Python gives the correct include path. You can for example use that in a cmake Find module (see this, for example; you'll have to set the PYTHON_EXECUTABLE variable yourself).

    Update: I modified the code to handle the case in which the input array is not contiguous -- this happens for example whenever you use a slice, as in arr[::2, ::2]. The most straightforward way to handle this is to use PyArray_FROM_OTF; do note, however, that this will make a copy of the array unless the array is already in contiguous form. The advantage of this approach is that it can handle the case in which the input is any kind of nested sequence (e.g., a list of lists).

    If your array is so big that you would prefer avoiding making copies of it, you can use PyArray_STRIDES to get the stride information that helps you access the data. From http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.strides.html,

    The byte offset of element (i[0], i[1], ..., i[n]) in an array a is:

    offset = sum(np.array(i) * a.strides)

    (note that this is the byte offset, so you should add it to the result of PyArray_BYTES before you cast to final data type; also, this will not help if the array is improperly aligned, or if it has the wrong byte order)

    Here's the sample code:

    #include <boost/python.hpp>
    
    #include <exception>
    
    #include <numpy/arrayobject.h>
    
    using namespace boost::python;
    
    struct WrongSizeError : public std::exception {
      const char* what() const throw() { return "Unsupported array size."; }
    };
    
    struct WrongTypeError : public std::exception {
      const char* what() const throw() { return "Unsupported array type."; }
    };
    
    // Boost::Python needs the translators
    void translate_sz(const WrongSizeError& e)
    {
      PyErr_SetString(PyExc_RuntimeError, e.what());
    }
    
    void translate_ty(const WrongTypeError& e)
    {
      PyErr_SetString(PyExc_RuntimeError, e.what());
    }
    
    // multiply matrix of double (m) by f
    object multiply(numeric::array m, double f)
    {
      PyObject* m_obj = PyArray_FROM_OTF(m.ptr(), NPY_DOUBLE, NPY_IN_ARRAY);
      if (!m_obj)
        throw WrongTypeError();
    
      // to avoid memory leaks, let a Boost::Python object manage the array
      object temp(handle<>(m_obj));
    
      // check that m is a matrix of doubles
      int k = PyArray_NDIM(m_obj);
      if (k != 2)
        throw WrongSizeError();
    
      // get direct access to the array data
      const double* data = static_cast<const double*>(PyArray_DATA(m_obj));
    
      // make the output array, and get access to its data
      PyObject* res = PyArray_SimpleNew(2, PyArray_DIMS(m_obj), NPY_DOUBLE);
      double* res_data = static_cast<double*>(PyArray_DATA(res));
    
      const unsigned size = PyArray_SIZE(m_obj); // number of elements in array
      for (unsigned i = 0; i < size; ++i)
        res_data[i] = f*data[i];
    
      return object(handle<>(res)); // go back to using Boost::Python constructs
    }
    
    BOOST_PYTHON_MODULE(test)
    {
      // register exception translators
      register_exception_translator<WrongSizeError>(&translate_sz);
      register_exception_translator<WrongTypeError>(&translate_ty);
    
      // tell Boost::Python under what name the array is known
      numeric::array::set_module_and_type("numpy", "ndarray");
      def("multiply", multiply);
    
      // initialize the Numpy C API
      import_array();
    }