Python C API: Assigning PyObjects to a dictionary causes memory leak

I am writing a C++ wrapper for Python using the Python C API. In my case I have to make bigger amounts of byte oriented data accessible for the Python script. For this purpose I use the PyByteArray_FromStringAndSize method to produce a Python bytearray (https://docs.python.org/2.7/c-api/bytearray.html).

When returning this bytearray directly I have not experienced any problems. When however adding the bytearray into a Python dict, the memory from the bytearray will not be released once the dict is destroyed.

This can be solved by calling Py_DECREF on the bytearray object after adding the bytearray object to the Python dict.

Below is a complete working example of my code containing a method dummyArrPlain returning the plain bytearray and a method dummyArrInDict returning a bytearray in a dict. The second method will produce a memory leak unless Py_DECREF(pyData); is called.

My question is: Why is Py_DECREF necessary at this point. Intuitively I would have expected that Py_DECREF should be called once the dict is destroyed.

Also I assign values like in the following to a dict:

PyDict_SetItem(dict, PyString_FromString("i"), PyInt_FromLong(i));

Will this also produce a memory leak when not calling Py_DECREF on the created string and long?

This is my dummy C++ wrapper:

#include <python2.7/Python.h>

static char module_docstring[] = "This is a module causing a memory leak";

static PyObject *dummyArrPlain(PyObject *self, PyObject *args);
static PyObject *dummyArrInDict(PyObject *self, PyObject *args);

static PyMethodDef module_methods[] = {
    {"dummy_arr_plain", dummyArrPlain, METH_VARARGS, "returns a plain dummy bytearray"},
    {"dummy_arr_in_dict", dummyArrInDict, METH_VARARGS, "returns a dummy bytearray in a dict"},
    {NULL, NULL, 0, NULL}
};

PyMODINIT_FUNC initlibdummy(void)
{
    PyObject *m = Py_InitModule("libdummy", module_methods);
    if (m == NULL)
        return;
}


static PyObject *dummyArrPlain(PyObject *self, PyObject *args)
{
    int len = 10000000;
    char* data = new char[len];
    for(int i=0; i<len; i++) {
        data[i] = 0;
    }

    PyObject * pyData = PyByteArray_FromStringAndSize(data, len);
    delete [] data;

    return pyData;
}


static PyObject *dummyArrInDict(PyObject *self, PyObject *args)
{
    int len = 10000000;
    char* data = new char[len];
    for(int i=0; i<len; i++) {
        data[i] = 0;
    }
    PyObject * pyData = PyByteArray_FromStringAndSize(data, len);
    delete [] data;

    PyObject *dict = PyDict_New();
    PyDict_SetItem(dict, PyString_FromString("data"), pyData);

    // memory leak without Py_DECREF(pyData);

    return dict;
}

And a dummy python script using the wrapper:

import libdummy
import time

while True:
    a = libdummy.dummy_arr_in_dict()
    time.sleep(0.01)

Solution

It's a matter of [Python 2.0.Docs]: Ownership rules. I'm going to exemplify on Python 2.7.10 (pretty old, but I don't think that the behavior has (significantly) changed along the way).

PyByteArray_FromStringAndSize (bytearrayobject.c: 168) creates a new object (using PyObject_New, and allocates memory for the buffer as well).

By default, the refcount of that object (or better: of any newly created object) is 1 (set by _Py_NewReference), so that when the user calls del upon it, or at program exit, the refcount will be decreased, and when reaching 0, the object will be deallocated.

This is the behavior on the flow where the object is returned
But, in dummyArrInDict's case, PyDict_SetItem does (indirectly) a Py_INCREF of pyData (it does other stuff, but only this is relevant in the current situation), ending up with a refcount of 2 and therefore the memory leak

It's basically same thing that you're doing with data: you allocate memory for it, and when you no longer need it, you free it (this is because you're not returning it, you only use it temporarily).

Note: It's safer to use the X macros (e.g. [Python 2.Docs]: Py_XDECREF, especially since you're not testing for NULL the returned PyObjects).

For more details, also take a look at [Python 2.Docs]: C API Reference.