I was building a manual cache version of a memoized Python Fibonacci function and I noticed I did not pass the cache as an argument in the recursive calls.
However, the function still works in the sense that it is much faster than a non-memoized version.
When I added the cache as a function argument, the algorithm was faster, but not significantly so.
Can someone please help me to understand why the first version works at all, and if/whether the second version is more correct?
import time
def fib_cache(n, cache={}):
if n in cache:
return cache[n]
if n == 0 or n == 1:
return n
result = fib_cache(n - 1) + fib_cache(n - 2)
cache[n] = result
return result
def fib_cache2(n, cache={}):
if n in cache:
return cache[n]
if n == 0 or n == 1:
return n
result = fib_cache2(n - 1, cache) + fib_cache2(n - 2, cache)
cache[n] = result
return result
start = time.perf_counter()
fib_cache(30)
end = time.perf_counter()
print("Version 1. Seconds taken: {:.5f}".format(end - start))
start = time.perf_counter()
fib_cache2(30)
end = time.perf_counter()
print("Version 2. Seconds taken: {:.5f}".format(end - start))
This is because def
in Python is executed only once and the default variables are intialized only once. In case of reference types this can lead to bugs/unexpected behaviour. One workaround is:
def fib_cache3(n, cache=None):
if cache is None:
cache = {}
if n in cache:
return cache[n]
if n == 0 or n == 1:
return n
result = fib_cache3(n - 1, cache) + fib_cache3(n - 2, cache)
cache[n] = result
return result
The advantage of this version is that it doesn't depend on the default initialization of reference type, and it allows garbage collections once the function is executed.