I'm trying to read many values (in sequence) from a large numpy array using the C API. I'd like a more efficient way than seperately using boost::python::extract(...) on each value. Something like getting a pointer to the first value and then just incrementing that pointer.
I've read through the numpy API docs and I can see that it's possible but am none the wiser on how to actually accomplish this. Can anyone point me to an example?
The Boost::Python API does not allow you to do that directly. You can use the Numpy C API to do it. Getting access to the underlying PyObject*
can be done using the ptr()
method of boost::python::object
. Access to the data can be obtained using PyArray_DATA(PyObject*)
.
Here's an example that multiplies a 2d Numpy array by a number. I've had some trouble finding out how to compile this on Mac OS (i.e., where the Numpy headers are), so I'll add this here: running
import numpy; print numpy.get_include()
in Python gives the correct include path. You can for example use that in a cmake Find module (see this, for example; you'll have to set the PYTHON_EXECUTABLE
variable yourself).
Update: I modified the code to handle the case in which the input array is not contiguous -- this happens for example whenever you use a slice, as in arr[::2, ::2]
. The most straightforward way to handle this is to use PyArray_FROM_OTF
; do note, however, that this will make a copy of the array unless the array is already in contiguous form. The advantage of this approach is that it can handle the case in which the input is any kind of nested sequence (e.g., a list of lists).
If your array is so big that you would prefer avoiding making copies of it, you can use PyArray_STRIDES
to get the stride information that helps you access the data. From http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.strides.html,
The byte offset of element
(i[0], i[1], ..., i[n])
in an arraya
is:
offset = sum(np.array(i) * a.strides)
(note that this is the byte offset, so you should add it to the result of PyArray_BYTES
before you cast to final data type; also, this will not help if the array is improperly aligned, or if it has the wrong byte order)
Here's the sample code:
#include <boost/python.hpp>
#include <exception>
#include <numpy/arrayobject.h>
using namespace boost::python;
struct WrongSizeError : public std::exception {
const char* what() const throw() { return "Unsupported array size."; }
};
struct WrongTypeError : public std::exception {
const char* what() const throw() { return "Unsupported array type."; }
};
// Boost::Python needs the translators
void translate_sz(const WrongSizeError& e)
{
PyErr_SetString(PyExc_RuntimeError, e.what());
}
void translate_ty(const WrongTypeError& e)
{
PyErr_SetString(PyExc_RuntimeError, e.what());
}
// multiply matrix of double (m) by f
object multiply(numeric::array m, double f)
{
PyObject* m_obj = PyArray_FROM_OTF(m.ptr(), NPY_DOUBLE, NPY_IN_ARRAY);
if (!m_obj)
throw WrongTypeError();
// to avoid memory leaks, let a Boost::Python object manage the array
object temp(handle<>(m_obj));
// check that m is a matrix of doubles
int k = PyArray_NDIM(m_obj);
if (k != 2)
throw WrongSizeError();
// get direct access to the array data
const double* data = static_cast<const double*>(PyArray_DATA(m_obj));
// make the output array, and get access to its data
PyObject* res = PyArray_SimpleNew(2, PyArray_DIMS(m_obj), NPY_DOUBLE);
double* res_data = static_cast<double*>(PyArray_DATA(res));
const unsigned size = PyArray_SIZE(m_obj); // number of elements in array
for (unsigned i = 0; i < size; ++i)
res_data[i] = f*data[i];
return object(handle<>(res)); // go back to using Boost::Python constructs
}
BOOST_PYTHON_MODULE(test)
{
// register exception translators
register_exception_translator<WrongSizeError>(&translate_sz);
register_exception_translator<WrongTypeError>(&translate_ty);
// tell Boost::Python under what name the array is known
numeric::array::set_module_and_type("numpy", "ndarray");
def("multiply", multiply);
// initialize the Numpy C API
import_array();
}