Search code examples
pythonc++memory-leakspython-c-api

Python C API: Assigning PyObjects to a dictionary causes memory leak


I am writing a C++ wrapper for Python using the Python C API. In my case I have to make bigger amounts of byte oriented data accessible for the Python script. For this purpose I use the PyByteArray_FromStringAndSize method to produce a Python bytearray (https://docs.python.org/2.7/c-api/bytearray.html).

When returning this bytearray directly I have not experienced any problems. When however adding the bytearray into a Python dict, the memory from the bytearray will not be released once the dict is destroyed.

This can be solved by calling Py_DECREF on the bytearray object after adding the bytearray object to the Python dict.

Below is a complete working example of my code containing a method dummyArrPlain returning the plain bytearray and a method dummyArrInDict returning a bytearray in a dict. The second method will produce a memory leak unless Py_DECREF(pyData); is called.

My question is: Why is Py_DECREF necessary at this point. Intuitively I would have expected that Py_DECREF should be called once the dict is destroyed.

Also I assign values like in the following to a dict:

PyDict_SetItem(dict, PyString_FromString("i"), PyInt_FromLong(i));

Will this also produce a memory leak when not calling Py_DECREF on the created string and long?

This is my dummy C++ wrapper:

#include <python2.7/Python.h>

static char module_docstring[] = "This is a module causing a memory leak";

static PyObject *dummyArrPlain(PyObject *self, PyObject *args);
static PyObject *dummyArrInDict(PyObject *self, PyObject *args);

static PyMethodDef module_methods[] = {
    {"dummy_arr_plain", dummyArrPlain, METH_VARARGS, "returns a plain dummy bytearray"},
    {"dummy_arr_in_dict", dummyArrInDict, METH_VARARGS, "returns a dummy bytearray in a dict"},
    {NULL, NULL, 0, NULL}
};

PyMODINIT_FUNC initlibdummy(void)
{
    PyObject *m = Py_InitModule("libdummy", module_methods);
    if (m == NULL)
        return;
}


static PyObject *dummyArrPlain(PyObject *self, PyObject *args)
{
    int len = 10000000;
    char* data = new char[len];
    for(int i=0; i<len; i++) {
        data[i] = 0;
    }

    PyObject * pyData = PyByteArray_FromStringAndSize(data, len);
    delete [] data;

    return pyData;
}


static PyObject *dummyArrInDict(PyObject *self, PyObject *args)
{
    int len = 10000000;
    char* data = new char[len];
    for(int i=0; i<len; i++) {
        data[i] = 0;
    }
    PyObject * pyData = PyByteArray_FromStringAndSize(data, len);
    delete [] data;

    PyObject *dict = PyDict_New();
    PyDict_SetItem(dict, PyString_FromString("data"), pyData);

    // memory leak without Py_DECREF(pyData);

    return dict;
}

And a dummy python script using the wrapper:

import libdummy
import time

while True:
    a = libdummy.dummy_arr_in_dict()
    time.sleep(0.01)

Solution

  • It's a matter of [Python 2.0.Docs]: Ownership rules. I'm going to exemplify on Python 2.7.10 (pretty old, but I don't think that the behavior has (significantly) changed along the way).

    PyByteArray_FromStringAndSize (bytearrayobject.c: 168) creates a new object (using PyObject_New, and allocates memory for the buffer as well).

    By default, the refcount of that object (or better: of any newly created object) is 1 (set by _Py_NewReference), so that when the user calls del upon it, or at program exit, the refcount will be decreased, and when reaching 0, the object will be deallocated.

    • This is the behavior on the flow where the object is returned

    • But, in dummyArrInDict's case, PyDict_SetItem does (indirectly) a Py_INCREF of pyData (it does other stuff, but only this is relevant in the current situation), ending up with a refcount of 2 and therefore the memory leak

    It's basically same thing that you're doing with data: you allocate memory for it, and when you no longer need it, you free it (this is because you're not returning it, you only use it temporarily).

    Note: It's safer to use the X macros (e.g. [Python 2.Docs]: Py_XDECREF, especially since you're not testing for NULL the returned PyObjects).

    For more details, also take a look at [Python 2.Docs]: C API Reference.