Search code examples
pythonc++armadillopybind11

Why might I get heap corruption using Armadillo matrices with pybind11?


I've worked on this for a couple weeks and can't make a reproducible example outside my codebase. That's why I need help! I'm not sure if this is a problem with pybind11 or Armadillo. It's not a problem with Carma since it happens in situations with no conversion going on.

EDIT: This actually does appear to be a bug in Carma and my MRE is working.

I've been trying to boil it down to an MRE and have been unsuccessful. I will explain what I know here, but since this isn't enough to reproduce the bug, what I need most are some ideas as to where to look. This is basically a very hard-to-reproduce memory corruption error (heap corruption, Windows fatal exception: code 0xc0000374). It seems to happen when I have a matrix A, not initialized, and assign a matrix to it large enough that Armadillo acquires memory. The crash happens when that memory is released.

I'm on Windows 10, using Armadillo 10.6.2 and pybind11 v2.7.1, compiling using Clang 11.

// mat_container.cpp
#include "mat_container.h"

MatrixContainer::MatrixContainer(size_t d) {
    A = arma::Mat<double>(d, d, arma::fill::eye);
    std::cerr << "filled arma matrix\n";
}

Binding code

#include <pybind11/numpy.h>
#include <pybind11/pybind11.h>
#include <armadillo>
#include <carma>
#include "mat_container.h"

PYBIND11_MODULE(example, m) {
  py::class_<MatrixContainer>(m, "MC").def(py::init<size_t, bool>())
       .def_readwrite("A", &MatrixContainer::A);
}

All I need to do to trigger the crash is from Python call example.MC(11) and it crashes upon releasing memory in the matrix destructor of the member variable (not the temporary one assigned to it). Armadillo debug messages right before crash:

@ __cdecl arma::Mat<double>::~Mat(void) [eT = double] [this = 000001569470DBA0]
Mat::destructor: releasing memory

In my attempted MRE, I tried to reproduce the same structure, with binding code using this MatrixContainer class compiled in a separate library. I don't know what could be missing so my MRE doesn't reproduce the bug.

Weird things

  • Only happens when calling from pybind11, not pure C++
  • Only happens when the constructor for the class containing the matrix is in a separately compiled source file. It doesn't happen when the constructor is in the header file.
  • Only happens when assigning a non-identity (e.g., np.ones but not np.eye) matrix to an object's matrix member variable. But this only happens on a class in my source code, not on the smaller test class I created!
  • Only happens when A is large enough for Armadillo to acquire memory instead of use local memory
  • Setting the size of A with set_size() before assigning to it doesn't solve the issue
  • This appears to be Armadillo-specific, when Armadillo releases memory (confirmed using ARMA_EXTRA_DEBUG), because I don't get a similar error when I use another class with dynamically allocated memory like a vector

When A is using acquired memory and is replaced by a matrix using local memory, the crash occurs

mc = MC(11)
mc.A = np.eye(3)

Arma debug messages right before crash, as mc.A is being assigned:

@ void __cdecl arma::Mat<double>::init_warm(arma::uword, arma::uword) [eT = double] [in_n_rows = 3, in_n_cols = 3]
Mat::init(): releasing memory

but not when it's the other way around; this runs successfully. It does not crash as the acquired memory is released as A is destroyed:

mc = MC(3)
mc.A = np.eye(11)

And when an 11x11 A is replaced by another 11x11 matrix from numpy, the crash is again Mat::destructor: releasing memory as in the very first example without assignment from Python. What I deduce from these examples is that the crash is averted when an acquired memory matrix prepared by Carma (not Armadillo directly) is assigned to a local memory A.

UPDATE: Still working on this, but it looks like the code fails because arma_extra_code is on (where it wasn't in the MRE). The culprit seems to be ARMA_ALIEN_MEM functions, set here


Solution

  • Credit to the carma developer, @RUrlus, for the answer:

    The problem is due to the bindings module, linked to carma, being linked to a library that hasn't been linked to carma. The external library (mc in the MRE) was allocating memory using the standard malloc while pybind11 was using Carma's free on destruction. The mismatch was causing the crash on Windows.

    Perhaps in the future there could be a more elegant solution, but for now the workaround is to link external libraries to carma at compile-time, e.g., target_link_libraries(mc PUBLIC armadillo carma) even if that external library doesn't include or use Carma in any obvious way.