Search code examples
pythoncgarbage-collectionpython-c-apipython-extensions

Correct cyclic garbage collection in extension modules


Two sections of Python 2.7's documentation mentioned adding cyclic garbage collection (CGC) support for container objects defined in extension modules.

The Python/C API Reference Manual gives two rules, i.e.,

  1. The memory for the object must be allocated using PyObject_GC_New() or PyObject_GC_NewVar().
  2. Once all the fields which may contain references to other containers are initialized, it must call PyObject_GC_Track().

Whereas in Extending and Embedding the Python Interpreter, for the Noddy example, it seems that adding the Py_TPFLAGS_HAVE_GC flag and filling tp_traverse and tp_clear slots would be sufficient to enable CGC support. And the two rules above are not practiced at all.

When I modified the Noddy example to actually follow the rules of PyObject_GC_New()/PyObject_GC_Del() and PyObject_Track()/PyObject_GC_UnTrack(), it surprisingly raised assertion error saying,

Modules/gcmodule.c:348: visit_decref: Assertion "gc->gc.gc_refs != 0" failed. refcount was too small

What is the correct and safe way to implement CGC? What would be a neat example of a container object with CGC support?


Solution

  • Under most normal circumstances you shouldn't need to do do the tracking/untracking yourself. This is described in the documentation, however it isn't made specifically clear. In the case of the Noddy example you definitely don't.

    The short version is that a TypeObject contains two function pointers: tp_alloc and tp_free. By default tp_alloc calls all the right functions on creation of a class (if Py_TPFLAGS_HAVE_GC is set) and tp_free untracks the class on destruction.

    The Noddy documentation says (at the end of the section):

    That’s pretty much it. If we had written custom tp_alloc or tp_free slots, we’d need to modify them for cyclic-garbage collection. Most extensions will use the versions automatically provided.

    Unfortunately, the one place that doesn't make it clear that you don't need to do this yourself is the Supporting Cyclic Garbage Collection documentation.


    Detail:

    Noddy's are allocated using a function called Noddy_new put in the tp_new slots of the TypeObject. According to the documentation, the main thing the "new" function should do is call the tp_alloc slot. You typically don't write tp_alloc yourself, and it just defaults to PyType_GenericAlloc().

    Looking at PyType_GenericAlloc() in the Python source shows a number of sections where it changes based on PyType_IS_GC(type). First it calls _PyObject_GC_Malloc instead of PyObject_Malloc, and second it calls _PyObject_GC_TRACK(obj). [Note that all that PyObject_New really does is call PyObject_Malloc and then tp_init.]

    Similarly, on deallocation you call the tp_free slot, which is automatically set to PyObject_GC_Del for classes with Py_TPFLAGS_HAVE_GC. PyObject_GC_Del includes code that does the same as PyObject_GC_UnTrack so a call to untrack is unnecessary.