I'm working on a simulation project, and I'm trying to figure out the best way to pass matrices between Python and C++. I'm using Python's NumPy and C++'s Eigen library, and I'm using PyBind11 to get them to communicate with eachother.
In my code (see below), I first create some arrays using NumPy in my Python script, and then I pass these as parameters to a constructor function of a C++ class I call rmodule
, which is essentially going to be the numerical engine of my simulation. I want an instance of my C++ class to have these NumPy arrays as object attributes (so they can be easily referenced), but I'm wondering what the best way to do this is.
If I just do a type conversion from a NumPy array to an Eigen matrix, PyBind is going to have to copy all that data to the C++ program. Although this seems like a lot of overhead, I feel like this would be ok if the copying is fast compared to the computations I do with the matrices.
My other choice is to only pass a reference to the NumPy arrays to my C++ instance. That way, the data is not going to be copied back and forth between Python and C++ - its going to be owned by Python and referenced by the C++ class. I think this may give me a performance speedup. However, I'm not sure if I'll run into trouble doing this - will I have to work around the GIL in some way? What other things should I keep in mind if this is the better approach?
TLDR: I'm using Python for File I/O and C++ for computations. Should I copy the data back and forth between Python and C++ or just have the data under Python's ownership and pass a reference to that data to C++?
Any help and advice is greatly appreciated.
C++ Code:
#include <pybind11/pybind11.h>
#include <random>
#include <iostream>
#include "Eigen/Dense"
#define R = 8.134 // Universal Gas Constant (J mol^-1 ºK^-1)
namespace py = pybind11;
using namespace Eigen;
class rmodule {
/** Encapsulated time-stepping logic that
can be easily constructed and referenced
by the Python interpreter.
:attributes:
C - Concentration Vector
F - Standard ΔGº_f of metabolites
T - Temperature (ºK)
S - Stoichiometric Matrix
*/
VectorXf C;
VectorXf F;
double T = 0.0;
MatrixXf S;
public:
rmodule(VectorXf pyC, MatrixXf pyS, VectorXf pyF, double pyT) {
/** Copies numpy array data into C++ Eigen classes. */
C = pyC;
S = pyS;
F = pyF;
T = pyT;
}
~rmodule(){ // TODO -- Will need to free data structures
;
}
};
PYBIND11_MODULE(reaction, m) {
m.doc() = "ERT reaction module"; // TODO -- How to bind?
py::class_<rmodule>(m, "rmodule")
.def(py::init<>()) // Here is the Problem! What should I do here? Reference or value?
;
}
Python Code:
import parser
import react # react is the name of my binary once I compile
import numpy as np
def main():
"""Program Driver"""
P = parser.Parser("test1.txt")
P.ReadData() # Builds numpy arrays
C = P.GetC() # Initial Concentrations #
S = P.GetS() # Stoichiometric Matrix #
F = P.GetF() # Standard ΔGº #
rmodule = react.rmodule(C, S, F, T=273.15)
if __name__ == "__main__":
main()
Figured out a compromise! I'm going to copy the values from Python to C++ once, and then just past references to the data from C++ to Python.