I've worked on this for a couple weeks and can't make a reproducible example outside my codebase. That's why I need help! I'm not sure if this is a problem with pybind11 or Armadillo. It's not a problem with Carma since it happens in situations with no conversion going on.
EDIT: This actually does appear to be a bug in Carma and my MRE is working.
I've been trying to boil it down to an MRE and have been unsuccessful. I will explain what I know here, but since this isn't enough to reproduce the bug, what I need most are some ideas as to where to look. This is basically a very hard-to-reproduce memory corruption error (heap corruption, Windows fatal exception: code 0xc0000374
). It seems to happen when I have a matrix A, not initialized, and assign a matrix to it large enough that Armadillo acquires memory. The crash happens when that memory is released.
I'm on Windows 10, using Armadillo 10.6.2 and pybind11 v2.7.1, compiling using Clang 11.
// mat_container.cpp
#include "mat_container.h"
MatrixContainer::MatrixContainer(size_t d) {
A = arma::Mat<double>(d, d, arma::fill::eye);
std::cerr << "filled arma matrix\n";
}
Binding code
#include <pybind11/numpy.h>
#include <pybind11/pybind11.h>
#include <armadillo>
#include <carma>
#include "mat_container.h"
PYBIND11_MODULE(example, m) {
py::class_<MatrixContainer>(m, "MC").def(py::init<size_t, bool>())
.def_readwrite("A", &MatrixContainer::A);
}
All I need to do to trigger the crash is from Python call example.MC(11)
and it crashes upon releasing memory in the matrix destructor of the member variable (not the temporary one assigned to it). Armadillo debug messages right before crash:
@ __cdecl arma::Mat<double>::~Mat(void) [eT = double] [this = 000001569470DBA0]
Mat::destructor: releasing memory
In my attempted MRE, I tried to reproduce the same structure, with binding code using this MatrixContainer class compiled in a separate library. I don't know what could be missing so my MRE doesn't reproduce the bug.
Weird things
np.ones
but not np.eye
) matrix to an object's matrix member variable. But this only happens on a class in my source code, not on the smaller test class I created!set_size()
before assigning to it doesn't solve the issueARMA_EXTRA_DEBUG
), because I don't get a similar error when I use another class with dynamically allocated memory like a vectorWhen A is using acquired memory and is replaced by a matrix using local memory, the crash occurs
mc = MC(11)
mc.A = np.eye(3)
Arma debug messages right before crash, as mc.A is being assigned:
@ void __cdecl arma::Mat<double>::init_warm(arma::uword, arma::uword) [eT = double] [in_n_rows = 3, in_n_cols = 3]
Mat::init(): releasing memory
but not when it's the other way around; this runs successfully. It does not crash as the acquired memory is released as A is destroyed:
mc = MC(3)
mc.A = np.eye(11)
And when an 11x11 A is replaced by another 11x11 matrix from numpy, the crash is again Mat::destructor: releasing memory
as in the very first example without assignment from Python. What I deduce from these examples is that the crash is averted when an acquired memory matrix prepared by Carma (not Armadillo directly) is assigned to a local memory A.
UPDATE: Still working on this, but it looks like the code fails because arma_extra_code is on (where it wasn't in the MRE). The culprit seems to be ARMA_ALIEN_MEM functions, set here
Credit to the carma developer, @RUrlus, for the answer:
The problem is due to the bindings module, linked to carma, being linked to a library that hasn't been linked to carma. The external library (mc in the MRE) was allocating memory using the standard malloc
while pybind11 was using Carma's free
on destruction. The mismatch was causing the crash on Windows.
Perhaps in the future there could be a more elegant solution, but for now the workaround is to link external libraries to carma at compile-time, e.g., target_link_libraries(mc PUBLIC armadillo carma)
even if that external library doesn't include or use Carma in any obvious way.