Search code examples
pythongarbage-collectioncpythonpypy

How to debug GC in PyPy?


I've been trying to switch from CPython to PyPy recently, and while trying to solve a bug, more precisely an error 139 with SIGSEGV signal (so a Segmentation Fault), I tried to investigate the garbage collection through the GC module by looking at the gc.garbage attribute list.

In CPython, I could for example run the following piece of code (taken from there with modifications) to check lingering objects in the GC garbage list:

import gc

gc.set_debug(gc.DEBUG_SAVEALL)

print(gc.get_count())
lst = []
lst.append(lst)
list_id = id(lst)
del lst
gc.collect()
for item in gc.garbage:
    print(item) if list_id == id(item) else "pass"

This code works well in CPython, but returns the following error in PyPy :

AttributeError: module 'gc' has no attribute 'set_debug'

Indeed, print(dir(gc)), which returns different lists of attributes and methods for the GC class, doesn't list gc.set_debug() for PyPy :

# Under CPython
['DEBUG_COLLECTABLE', 'DEBUG_LEAK', 'DEBUG_SAVEALL', 'DEBUG_STATS', 'DEBUG_UNCOLLECTABLE', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'callbacks', 'collect', 'disable', 'enable', 'garbage', 'get_count', 'get_debug', 'get_objects', 'get_referents', 'get_referrers', 'get_stats', 'get_threshold', 'is_tracked', 'isenabled', 'set_debug', 'set_threshold']

# Under PyPy
['GcCollectStepStats', 'GcRef', '__doc__', '__loader__', '__name__', '__package__', '__spec__', '_dump_rpy_heap', '_get_stats', 'collect', 'collect_step', 'disable', 'disable_finalizers', 'dump_rpy_heap', 'enable', 'enable_finalizers', 'garbage', 'get_objects', 'get_referents', 'get_referrers', 'get_rpy_memory_usage', 'get_rpy_referents', 'get_rpy_roots', 'get_rpy_type_index', 'get_stats', 'get_typeids_list', 'get_typeids_z', 'hooks', 'isenabled']

If I understood correctly, setting gc.set_debug(gc.DEBUG_SAVEALL) keeps unreachable objects in the garbage list of the GC, so without it, gc.collect() would attempt to free the object's memory allocation. But I want to inspect the garbage list before, because I suspect it triggers the Segmentation Fault I'm trying to track.

Despite looking through PyPy's documentation on garbage collection (like here, here) and other places (like here or here), I haven't been able to find a way to watch the garbage collection process minutely in PyPy like it's possible to do in CPython. So, could someone explain to me how the differences between PyPy's and CPython's GC affect the above test code, and more precisely, how is it possible to watch pending objects in gc.garbage before collection with PyPy?

I'm running Python 3.6.9 with PyPy 7.3.2. GCC is 8.4.0 for CPython, and 7.3.1 for PyPy.


Solution

  • It's not possible to do what you're trying to do. Even on CPython, the list gc.garbage will by far not contain all objects that are reclaimed, even if you enable debug mode, but only the ones that have been found to be in cycles. That's unlikely to be relevant to anyone except the authors of the cycle-finding logic itself. And on PyPy, the notion of "being in a cycle" is even less relevant; as you have probably understood already from the various links you point to, PyPy's GC is quite different.

    No, there is no way to inspect all objects that are dying. In fact PyPy's GC is optimized for objects that die young, and for all of these (which are typically 80%-90% of all objects in a program) then the structure of the GC is such that there is no way to even know what the dying objects are. These 80%-90% of objects occupy space that is reclaimed in bulk, not one by one.

    In all likelihood, you're looking at your problem from the wrong end. If you can describe a bit more what your problem is, we can try to come up with better solutions. In the meantime, note that you can run pypy -X faulthandler to get at least some kind of traceback when you get a segfault.