Conversion from C++ vector to Numpy ndarray is very slow

I use Boost python for the computationally intensive parts of a program, and it works quite well, except that passing arrays from C++ to python and vice versa is very slow, to the point that it is the limiting factor to the overall efficiency of the program.

Here is an example to illustrate my point. On the C++ side, I return a matrix with type vector< vector<double> > of relatively large size. On the python side, I call that function and try converting the resulting array using two different methods: the numpy.array method, and my own (probably quite naive) C++ implementation of a basic converter. The C++ part:

#include <boost/python.hpp>
#include <boost/python/numpy.hpp>
#include <boost/python/suite/indexing/vector_indexing_suite.hpp>

using namespace std;

typedef vector<double> vec;
typedef vector<vec> mat;

mat test()
{
    int n = 1e4;
    mat result(n, vec(n, 0.));
    return result;
}

namespace p = boost::python;
namespace np = boost::python::numpy;

np::ndarray convert_to_numpy(mat const & input)
{
    u_int n_rows = input.size();
    u_int n_cols = input[0].size();
    p::tuple shape = p::make_tuple(n_rows, n_cols);
    np::dtype dtype = np::dtype::get_builtin<double>();
    np::ndarray converted = np::zeros(shape, dtype);

    for (u_int i = 0; i < n_rows; i++)
    {
        for (u_int j = 0; j < n_cols; j++)
        {
            converted[i][j] = input[i][j];
        }
    }
    return converted;
}


BOOST_PYTHON_MODULE(hermite_cpp)
{
    using namespace boost::python;

    // Initialize numpy
    Py_Initialize();
    boost::python::numpy::initialize();

    class_<vec>("double_vec")
        .def(vector_indexing_suite<vec>())
        ;

    class_<mat>("double_mat")
        .def(vector_indexing_suite<mat>())
        ;

    def("convert_to_numpy", convert_to_numpy);
    def("test", test);
}

The python part:

import test
import numpy as np
import time


def timeit(function):
    def wrapper(*args, **kwargs):
        tb = time.time()
        result = function(*args, **kwargs)
        te = time.time()
        print(te - tb)
        return result
    return wrapper


A = timeit(test.test)()
B = timeit(np.array)(A)
C = timeit(test.convert_to_numpy)(A)

The results of this program are as follows:

0.56
36.68
26.56

Can the conversion be made faster? Or, even better, could the array be shared between numpy and C++. I have googled around for a long time, but without much success.

Solution

I've been doing these conversions this way and they performed quite fast:

void convert_to_numpy(const mat & input, p::object obj)
{
    PyObject* pobj = obj.ptr();
    Py_buffer pybuf;
    PyObject_GetBuffer(pobj, &pybuf, PyBUF_SIMPLE);
    void *buf = pybuf.buf;
    double *p = (double*)buf;
    Py_XDECREF(pobj);

    u_int n_rows = input.size();
    u_int n_cols = input[0].size();
    for (u_int i = 0; i < n_rows; i++)
    {
        for (u_int j = 0; j < n_cols; j++)
        {
            p[i*n_cols+j] = input[i][j];
        }
    }
}

Then in python:

C = np.empty([10000*10000], dtype=np.float64)
timeit(test.convert_to_numpy)(A,C)

Timings:

0.557882070541
0.12882900238