Search code examples
pythonmemoization

Python Memoization Manual Caching


I was building a manual cache version of a memoized Python Fibonacci function and I noticed I did not pass the cache as an argument in the recursive calls.

However, the function still works in the sense that it is much faster than a non-memoized version.

When I added the cache as a function argument, the algorithm was faster, but not significantly so.

Can someone please help me to understand why the first version works at all, and if/whether the second version is more correct?

import time


def fib_cache(n, cache={}):
    if n in cache:
        return cache[n]
    if n == 0 or n == 1:
        return n
    result = fib_cache(n - 1) + fib_cache(n - 2)
    cache[n] = result
    return result


def fib_cache2(n, cache={}):
    if n in cache:
        return cache[n]
    if n == 0 or n == 1:
        return n
    result = fib_cache2(n - 1, cache) + fib_cache2(n - 2, cache)
    cache[n] = result
    return result

start = time.perf_counter()
fib_cache(30)
end = time.perf_counter()
print("Version 1. Seconds taken: {:.5f}".format(end - start))

start = time.perf_counter()
fib_cache2(30)
end = time.perf_counter()
print("Version 2. Seconds taken: {:.5f}".format(end - start))

Solution

  • This is because def in Python is executed only once and the default variables are intialized only once. In case of reference types this can lead to bugs/unexpected behaviour. One workaround is:

    def fib_cache3(n, cache=None):
        if cache is None:
            cache = {}
        if n in cache:
            return cache[n]
        if n == 0 or n == 1:
            return n
        result = fib_cache3(n - 1, cache) + fib_cache3(n - 2, cache)
        cache[n] = result
        return result
    

    The advantage of this version is that it doesn't depend on the default initialization of reference type, and it allows garbage collections once the function is executed.