Search code examples
pythondictionaryweak-references

Safely iterating over WeakKeyDictionary and WeakValueDictionary


The documentation of Python 3.2's weakref module's WeakKeyDictionary and WeakValueDictionary have a note on iterating over these containers:

Note: Caution: Because a WeakKeyDictionary is built on top of a Python dictionary, it must not change size when iterating over it. This can be difficult to ensure for a WeakKeyDictionary because actions performed by the program during iteration may cause items in the dictionary to vanish “by magic” (as a side effect of garbage collection).

That seems rather dire as a specification of these container's behavior. Especially when running code that uses CPython's garbage collector (when using data structures that contain cycle) or using another Python implementation (e.g. Jython), then it sounds as if there is no safe way of iterating over these collections.

How can I safely iterate over these collections when the garbage collector may clear references at any point in my program? Having a solution for CPython is my priority but I'm interested about the issue on other implementations as well.

Is this maybe a safe way to iterate over a WeakKeyDictionary?

import weakref

d = weakref.WeakKeyDictionary()

...

for k, v in list(d.items()):
    ...

Solution

  • It is actually safe to iterate over a WeakKeyDictionary, WeakValueDictionary, or WeakSet in Python 2.7 or Python 3.1+. They put in an iteration guard that prevents weakref callbacks from removing references from the underlying dict or set during iteration all the way back in 2010, but the docs never got updated.

    With the guard in, if an entry dies before iteration reaches it, iteration will skip that entry, but it won't result in a segfault or a RuntimeError or anything. Dead entries will be added to a list of pending removals and handled later.

    Here's the guard (not threadsafe, despite the comment):

    class _IterationGuard:
        # This context manager registers itself in the current iterators of the
        # weak container, such as to delay all removals until the context manager
        # exits.
        # This technique should be relatively thread-safe (since sets are).
    
        def __init__(self, weakcontainer):
            # Don't create cycles
            self.weakcontainer = ref(weakcontainer)
    
        def __enter__(self):
            w = self.weakcontainer()
            if w is not None:
                w._iterating.add(self)
            return self
    
        def __exit__(self, e, t, b):
            w = self.weakcontainer()
            if w is not None:
                s = w._iterating
                s.remove(self)
                if not s:
                    w._commit_removals()
    

    Here's where the WeakKeyDictionary weakref callback checks the guard:

    def remove(k, selfref=ref(self)):
        self = selfref()
        if self is not None:
            if self._iterating:
                self._pending_removals.append(k)
            else:
                del self.data[k]
    

    And here's where WeakKeyDictionary.__iter__ sets the guard:

    def keys(self):
        with _IterationGuard(self):
            for wr in self.data:
                obj = wr()
                if obj is not None:
                    yield obj
    
    __iter__ = keys
    

    The same guard is used in the other iterators.


    If this guard did not exist, calling list(d.items()) wouldn't be safe either. A GC pass could happen inside the items iterator and remove items from the dict during iteration. (The fact that list is written in C would provide no protection.)


    Back in 2.6 and earlier, the safest way to iterate over a WeakKeyDictionary or WeakValueDictionary would have been to use items. items would return a list, and it would use the underlying dict's items method, which would have been (mostly?) not interruptible by GC. The dict API changes in 3.0 changed how keys/values/items worked, which is probably why the guard was introduced when it was.