Search code examples
pythonpython-c-api

How to return a cached PyObject* value several times without memory leaking or double/triple frees?


On my last question How to improve Python C Extensions file line reading? was brought up some memory problems I had. Then, I wrote this simple and ridiculous usage of and Python C Extension, just with the purpose of trying to understanding better Python Borrowing and Owning References.

static void PyFastFile_dealloc(PyFastFile* self) {
    for( PyObject* pyobject : linecache ) {
        Py_XDECREF( pyobject );
    }
    Py_TYPE(self)->tp_free( (PyObject*) self );
}

static PyObject* PyFastFile_tp_iter(PyFastFile* self, PyObject* args) {
    counter = 10;
    std::string line{"sample line"}
    PyObject* obj = PyUnicode_DecodeUTF8( line.c_str(), line.size(), "replace" );
    linecache.push_back( obj );
}

static PyObject* PyFastFile_iternext(PyFastFile* self, PyObject* args) {
    --counter;

    if( !( counter ) ) {
        PyErr_SetNone( PyExc_StopIteration );
        return NULL;
    }

    PyObject* retval = linecache[0];
    Py_XINCREF( retval );
    return retval;
}

// create the module
PyMODINIT_FUNC PyInit_fastfilepackage(void) {
    PyFastFileType.tp_iter = (getiterfunc) PyFastFile_tp_iter;
    PyFastFileType.tp_iternext = (iternextfunc) PyFastFile_iternext;
    PyFastFileType.tp_dealloc = (destructor) PyFastFile_dealloc;
    ...
}

On this case,

  1. Is the tp_next() returning an owned reference to linecache[0] because it is incrementing its Py_XINCREF?
  2. Meaning, is my linecache[0] cache now a borrowed reference?
  3. As tp_next() is being called more than once, and I am returning the same pointer several times incrementing its Py_XINCREF, is this going to lead to double/triple/several frees?
  4. Has the tp_next() return object only one owned reference, which will lead to double/triple/several?

Related:

  1. Py_INCREF/DECREF: When

Solution

  • You basically need the reference count of an object to be equal to the number of PyObject* that refer to it.

    1. Yes - you return a PyObject* to your Python code so you should increment the reference count.

    2. No - when linecache[0] is created it has a refcount of 1 and this represents ownership by linecache. Multiple places can "own" a single Python object.

    3. Yes you are returning the same pointer multiple times; no this will not result in multiple frees. The pointer is freed when the reference count reaches 0. This will be when you break all the references to the values you've returned from next, and when you lose the reference in linecache (when PyFastFile_dealloc is called).

    4. I don't understand the last question, but the code here is basically correct.


    The one issue I can see here is "what is linecache/who owns it". If it's a global variable then it may end up being shared between multiple PyFastFile objects which is probably wrong. The destruction of a single PyFastFile will lead to the entire linecache being freed, but you don't pop_back or NULL the pointers.