Search code examples
python-3.xgarbage-collectioncpythonpython-internals

Understanding Cpython garbage collection generations


I am trying to improve my understanding of how memory management works in python and I am confused about the concept of generations in pythons garbage collector module.

My understanding is as follows:

All objects created start in generation 0, once a threshold (700 by default for gen 0) is reached python will run a collection on that generation and any surviving objects age into the next gen.

Given the above, I cant understand the below output.

import gc
import sys
x = 1
print(gc.get_count())
gc.collect()
print(gc.get_count())

Output

(64, 1, 1)
(0, 0, 0)

Firstly, I've only run 1 line of code and Ive already got objects in gen 1 and 2 implying that garbage collection has already occurred at-least twice, how is this possible? Is there anyway to find out what objects are in each generation? Secondly, Why do I have 0 references in all generations after collection? I can still run the command print(x) and not get an error. Doesn't this mean there is still a reference to x and so it should exist in one of the generations?


Solution

  • gc.get_count() shows you the counter for each generation, towards the threshold.

    It is not the amount of objects in each generation, but the counter, that when it reaches the threadshold, the collection will occur for that generation.

    For example, if I start with (0,0,0) on the counter, running x = [[] for i in range(100)] will set the counter to (101,0,0).

    Running y = [[] for i in range(600)] will cause the counter to flip to (0,1,0) and gen0 collection will run. At this point all of my lists will move to gen1 as they survived a gen0 collection.

    When counter reaches (699,699,0) and another object is allocated, gen0 and gen1 collection will happen and the counter will go to (0,0,1). When counter reaches (699,699,699), and an object is allocated, or you use gc.collect() (which runs gen2 collection), counter will reset back to (0,0,0).

    To get the objects in each generation use gc.get_objects(gen).

    Regarding the garbage collection before your code runs - when Python starts, it creates lots of objects, before even loading your script. For example, you can see the modules that were loaded by running sys.modules. When those objects are created, the garbage collector runs automatically.