Search code examples
pythonlinked-listgarbage-collection

Does breaking a linked list help Python's garbage collector?


I was wondering whether — when I have the opportunity — I should break up a singly-connected linked list in Python when I don't need it anymore, or if I should not bother myself with it.

List example:

class Link:
    def __init__(self, next=None):
        self.next = next


L = Link(Link(Link(Link())))

Breaking the links:

def break_(link):
    if link is not None:
        break_(link.next)
        link.next = None  # link broken

# application
break_(L)

Tree example:

class Node:
    def __init__(self, l=None, r=None):
        self.l = l
        self.r = r

T = \
Node(
    Node(
        Node(), 
        Node()
    ),
    Node(
        Node(), 
        Node()
    ),
)

Breaking the links:

def break_(node):
    if node is not None:
        break_(node.l)
        break_(node.r)
        node.l = None  # link broken
        node.r = None  # link broken
        
# application
break_(T)

Basically, what I wonder is, what is the performance-wise optimal way to write your code in this situation. Is the GC prepared to deal with large linked structures? Doesn't it have to run potentially long DFSes with counters and such to determine which objects can be freed? Isn't it simpler to just break the links and give the GC a bunch of loose objects with zero references anywhere?

I could benchmark this, but I'm looking for an explanation (ideally from Karl), if it's available. Thank you.


Solution

  • You can add a __del__ method to your class to see when the object is about to be cleaned up:

    class Node:
        def __init__(self, name, l=None, r=None):
            self.name = name
            self.l = l
            self.r = r
    
        def __del__(self):
            print(f'__del__ {self.name}')
    
    
    T = \
    Node('0',
        Node('0L',
            Node('0LL'), 
            Node('0LR'),
        ),
        Node('0R',
            Node('0RL'), 
            Node('0RR'),
        ),
    )
    del T  # decreases the reference count of T by 1
    

    What happens with del T is that the reference count of the object referred to by T is decreased by 1. In this case, it happens that the reference count reaches 0. CPython knows this because it stores a reference count together with each object. When the reference count of an object reaches 0, the object will be marked for cleanup. This cleanup happens before control is given back to user code. So for the above example, it means the object that T originally referred to is immediately cleaned up. This first invokes __del__ and then the corresponding C-code for cleanup. This in turn goes through the object's instance dict and decreases the reference count for each other object stored there by 1. If another object's reference count reaches 0 in that process, it is also marked for cleanup. Then this procedure repeats until no objects are marked for cleanup and control is given back to the main loop.

    This is the output from the example:

    __del__ 0
    __del__ 0L
    __del__ 0LL
    __del__ 0LR
    __del__ 0R
    __del__ 0RL
    __del__ 0RR
    

    As mentioned above, when 0 is cleaned up, all other objects it references have their reference count reduced by 1 (i.e. 0L and 0R) and get cleaned up as well if the ref count reaches 0.

    If you manually break the links of the list by setting the corresponding instance attribute to None this adds additional overhead, as it has to actually modify all of these objects in memory (updating their instance dict). The reference count of the child objects reaching 0 is more of a byproduct here. Also note that the order of cleanup is changed since the clean function goes depth first:

    def clean(node):
        if node is not None:
            clean(node.l)
            clean(node.r)
            node.l = None  # this updates the object in memory
            node.r = None
    
    clean(T)
    

    produces

    __del__ 0LL
    __del__ 0LR
    __del__ 0RL
    __del__ 0RR
    __del__ 0L
    __del__ 0R
    __del__ 0