Search code examples
pythondecoratormemoizationinternals

How can data remain persistent across multiple calls of decorated function?


The following function is meant to be used as a decorator that stores the results of already computed values. If the argument has already been calculated before, the function will return the value stored in the cache dictionary:

def cached(f):
    f.cache = {}
    def _cachedf(*args):
        if args not in f.cache:
            f.cache[args] = f(*args)

        return f.cache[args]

    return _cachedf

I realized (by mistake) that cache does not need to be an attribute of the function object. As a matter of facts, the following code works as well:

def cached(f):
    cache = {}   # <---- not an attribute this time!
    def _cachedf(*args):
        if args not in cache:
            cache[args] = f(*args)

        return cache[args]
    return _cachedf

I am having a hard time understanding how can the cache object be persistent across multiple calls. I tried calling multiple cached functions several times and could not find any conflict or problems.

Can anyone please help me understand how the cache variable still exists even after the _cachedf function is returned?


Solution

  • You are creating a closure here: The function _cachedf() closes over the variable cache from the enclosing scope. This keeps cache alive as long as the function object lives.

    Edit: Maybe I should add a few more details on how this works in Python and how CPython implements this.

    Let's look at a simpler example:

    def f():
        a = []
        def g():
            a.append(1)
            return len(a)
        return g
    

    Example usage in the interactive interpreter

    >>> h = f()
    >>> h()
    1
    >>> h()
    2
    >>> h()
    3
    

    During compilation of the module containing the function f(), the compiler sees that the function g() references the name a from the enclosing scope and memorises this external reference in the code object corresponding to the function f() (specifically, it adds the name a to f.__code__.co_cellvars).

    So what happens when the function f() is called? The first line create a new list object and binds it to the name a. The next line creates a new function object (using a the code object created during the compilation of the module) and binds it to the name g. The body of g() isn't executed at this point, and finally the funciton object is returned.

    Since the code object of f() has a note that the name a is referenced by local functions, a "cell" for this name is created when f() is entered. This cell contains the reference to the actual list object a is bound to, and the function g() gets a reference to this cell. That way, the list object and the cell are kept alive even when the funciton f() exits.