Complex C++ lifetime issue in python bindings between C++ and numpy

I'm looking for advice on how to handle a complex lifetime issue between C++ and numpy / Python. Sorry for the wall of text, but I wanted to provide as much context as possible.

I developed cvnp, a library that offers casts between bindings between cv::Mat and py::array objects, so that the memory is shared between the two, when using pybind11. It is originally based on a SO answer by Dan Mašek . All is going well and the library is used in several projects, including robotpy, which is a Python library for the FIRST Robotics Competition.

However, an issue was raised by a user, that deals with the lifetime of linked cv::Mat and py::array objects.

In the direction cv::Mat -> py::array, all is well, as mat_to_nparray will create a py::array that keeps a reference to the linked cv::Mat via a "capsule" (a python handle).
However, in the direction py::array -> cv::Mat, nparray_to_mat the cv::Mat will access the data of the py::array, without any reference to the array (so that the lifetime of the py::array is not guaranteed to be the same as the cv::Mat)

See mat_to_nparray:

py::capsule make_capsule_mat(const cv::Mat& m)
{
    return py::capsule(new cv::Mat(m)
        , [](void *v) { delete reinterpret_cast<cv::Mat*>(v); }
    );
}

pybind11::array mat_to_nparray(const cv::Mat& m)
{
    return pybind11::array(detail::determine_np_dtype(m.depth())
        , detail::determine_shape(m)
        , detail::determine_strides(m)
        , m.data
        , detail::make_capsule_mat(m)
        );
}

and nparray_to_mat:

cv::Mat nparray_to_mat(pybind11::array& a)
{
    ...
    cv::Mat m(size, type, is_not_empty ? a.mutable_data(0) : nullptr);
    return m;
}

This worked well so far, until a user wrote this:

a bound c++ function that returns the same cv::Mat that was passed as an argument

m.def("test", [](cv::Mat mat) { return mat; });

some python code that uses this function

img = np.zeros(shape=(480, 640, 3), dtype=np.uint8)
img = test(img)

In that case, a segmentation fault may occur, because the py::array object is destroyed before the cv::Mat object, and the cv::Mat object tries to access the data of the py::array object. However, the segmentation fault is not systematic, and depends on the OS + python version.

I was able to reproduce it in CI via this commit using ASAN. The reproducing code is fairly simple:

void test_lifetime()
{
    // We need to create a big array to trigger a segfault
    auto create_example_array = []() -> pybind11::array
    {
        constexpr int rows = 1000, cols = 1000;
        std::vector<pybind11::ssize_t> a_shape{rows, cols};
        std::vector<pybind11::ssize_t> a_strides{};
        pybind11::dtype a_dtype = pybind11::dtype(pybind11::format_descriptor<int32_t>::format());
        pybind11::array a(a_dtype, a_shape, a_strides);
        // Set initial values
        for(int i=0; i<rows; ++i)
            for(int j=0; j<cols; ++j)
                *((int32_t *)a.mutable_data(j, i)) = j * rows + i;

        printf("Created array data address =%p\n%s\n",
               a.data(),
               py::str(a).cast<std::string>().c_str());
        return a;
    };

    // Let's reimplement the bound version of the test function via pybind11:
    auto test_bound = [](pybind11::array& a) {
        cv::Mat m = cvnp::nparray_to_mat(a);
        return cvnp::mat_to_nparray(m);
    };

    // Now let's reimplement the failing python code in C++
    //    img = np.zeros(shape=(480, 640, 3), dtype=np.uint8)
    //    img = test(img)
    auto img = create_example_array();
    img = test_bound(img);

    // Let's try to change the content of the img array
    *((int32_t *)img.mutable_data(0, 0)) = 14;  // This triggers an error that ASAN catches
    printf("img data address =%p\n%s\n",
           img.data(),
           py::str(img).cast<std::string>().c_str());
}

I'm looking for advices on how to handle this issue. I see several options:

An ideal solution would be to

call pybind11::array.inc_ref() when constructing the cv::Mat inside nparray_to_mat
make sure that pybind11::array.dec_ref() is called when this particular instance will be destroyed. However, I do not see how to do it.

Note: I know that cv::Mat can use a custom allocator, but it is useless here, as the cv::Mat will not allocate the memory itself, but will use the memory of the py::array object.

Thanks for reading this far, and thanks in advance for any advice!

Solution

Well, the solution was inspired by cv_numpy.cpp in OpenCV source code, and was implemented thanks to the help of Dustin Spicuzza.

It uses a custom MatAllocator that uses a numpy array as the data pointer, and will refer to this data instead of allocating.

        // Translated from cv2_numpy.cpp in OpenCV source code
        class CvnpAllocator : public cv::MatAllocator
        {
        public:
            CvnpAllocator() = default;
            ~CvnpAllocator() = default;

            // Attaches a numpy array object to a cv::Mat
            static void attach_nparray(cv::Mat &m, pybind11::array& a)
            {
                static CvnpAllocator instance;

                cv::UMatData* u = new cv::UMatData(&instance);
                u->data = u->origdata = (uchar*)a.mutable_data(0);
                u->size = a.size();
                
                // This is the secret sauce: we inc the number of ref of the array
                u->userdata = a.inc_ref().ptr();
                u->refcount = 1;

                m.u = u;
                m.allocator = &instance;
            }

            cv::UMatData* allocate(int dims0, const int* sizes, int type, void* data, size_t* step, cv::AccessFlag flags, cv::UMatUsageFlags usageFlags) const override
            {
                throw py::value_error("CvnpAllocator::allocate \"standard\" should never happen");
                // return stdAllocator->allocate(dims0, sizes, type, data, step, flags, usageFlags);
            }

            bool allocate(cv::UMatData* u, cv::AccessFlag accessFlags, cv::UMatUsageFlags usageFlags) const override
            {
                throw py::value_error("CvnpAllocator::allocate \"copy\" should never happen");
                // return stdAllocator->allocate(u, accessFlags, usageFlags);
            }

            void deallocate(cv::UMatData* u) const override
            {
                if(!u)
                    return;

                // This function can be called from anywhere, so need the GIL
                py::gil_scoped_acquire gil;
                assert(u->urefcount >= 0);
                assert(u->refcount >= 0);
                if(u->refcount == 0)
                {
                    PyObject* o = (PyObject*)u->userdata;
                    Py_XDECREF(o);
                    delete u;
            }
        };


    cv::Mat nparray_to_mat(pybind11::array& a)
    {
        bool is_contiguous = is_array_contiguous(a);
        bool is_not_empty = a.size() != 0;
        if (! is_contiguous && is_not_empty) {
            throw std::invalid_argument("cvnp::nparray_to_mat / Only contiguous numpy arrays are supported. / Please use np.ascontiguousarray() to convert your matrix");
        }

        int depth = detail::determine_cv_depth(a.dtype());
        int type = detail::determine_cv_type(a, depth);
        cv::Size size = detail::determine_cv_size(a);
        cv::Mat m(size, type, is_not_empty ? a.mutable_data(0) : nullptr);

        if (is_not_empty) {
            detail::CvnpAllocator::attach_nparray(m, a); //, ndims, size, type, step);
        }

        return m;
    }

See code in the repository here and here

@dan-mašek: your input would be welcome!