If my understanding is correct, in CPython objects will be deleted as soon as their reference count reaches zero. If you have reference cycles that become unreachable that logic will not work, but on occasion the interpreter will try to find them and delete them (and you can do this manually by calling gc.collect() ).
My question is, when do these interpreter-triggered cycle collection steps happen? What kind of events trigger them?
I am more interested in the CPython case, but would love to hear how this differs in PyPy or other python implementations.
The GC runs periodically based on the (delta between the) number of allocations and deallocations that have taken place since the last GC run.
See the gc.set_threshold()
function:
In order to decide when to run, the collector keeps track of the number object allocations and deallocations since the last collection. When the number of allocations minus the number of deallocations exceeds threshold0, collection starts.
You can access the current counts with gc.get_count()
; this returns a tuple of the 3 counts GC tracks (the other 2 are to determine when to run deeper scans).
The PyPy garbage collector operates entirely differently, as the GC process in PyPy is responsible for all deallocations, not just cyclic references. Moreover, the PyPy garbage collector is pluggable, meaning that how often it runs depends on what GC option you have picked. The default Minimark strategy doesn't even run at all when below a memory threshold, for example.
See the RPython toolchain Garbage Collector documentation for some details on their strategies, and the Minimark configuration options for more hints on what can be tweaked.
Ditto for Jython or IronPython; these implementations rely on the host runtime (Java and .NET) to handle garbage collection for them.