Search code examples
pythoncachingpython-decorators

What difficulties might arise from using mutable arguments to an `lru_cache` decorated function?


In the comment: Is there a decorator to simply cache function return values?

@gerrit points out a problem with using mutable, but hashable, objects to to a function with the functools.lru_cache decorator:

If I pass a hashable, mutable argument, and change the value of the object after the first call of the function, the second call will return the changed, not the original, object. That is almost certainly not what the user wants.

From my understanding, assuming the __hash__() function of the mutable objects is manually defined to hash the member variables (and not just using the object's id() that is the default for custom objects), changing the argument object will change the hash, hence, a second call should to the lru_cache decorated function should not utilize the cache.

If the __hash__() function is defined correctly for the mutable argument(s), is there any unattended behavior that can arise from using mutable arguments to lru_cache decorated functions?


Solution

  • My comment was wrong/misleading and does not relate to lru_cache, but to any attempt to create a caching function that works more generically.

    I was facing a need for a caching function that works for function that input and output NumPy arrays, which are mutable and not hashable. Because NumPy arrays are not hashable, I could not use functools.lru_cache. I ended up wroting something like this:

    def mutable_cache(maxsize=10):
        """In-memory cache like functools.lru_cache but for any object
    
        This is a re-implementation of functools.lru_cache.  Unlike
        functools.lru_cache, it works for any objects, mutable or not.
        Therefore, it returns a copy and it is wrong if the mutable
        object has changed!  Use with caution!
    
        If you call the *resulting* function with a keyword argument
        'CLEAR_CACHE', the cache will be cleared.  Otherwise, cache is rotated
        when more than `maxsize` elements exist in the cache.  Additionally,
        if you call the resulting function with NO_CACHE=True, it doesn't
        cache at all.  Be careful with functions returning large objects.
        Everything is kept in RAM!
    
        Args:
            maxsize (int): Maximum number of return values to be remembered.
    
        Returns:
            New function that has caching implemented.
        """
    
        sentinel = object()
        make_key = functools._make_key
    
        def decorating_function(user_function):
            cache = {}
            cache_get = cache.get
            keylist = []  # don't make it too long
    
            def wrapper(*args, **kwds):
                if kwds.get("CLEAR_CACHE"):
                    del kwds["CLEAR_CACHE"]
                    cache.clear()
                    keylist.clear()
                if kwds.get("NO_CACHE"):
                    del kwds["NO_CACHE"]
                    return user_function(*args, **kwds)
                elif "NO_CACHE" in kwds:
                    del kwds["NO_CACHE"]
                key = str(args) + str(kwds)
                result = cache_get(key, sentinel)
                if result is not sentinel:
                    # make sure we return a copy of the result; when a = f();
                    # b = f(), users should reasonably expect that a is not b.
                    return copy.copy(result)
                result = user_function(*args, **kwds)
                cache[key] = result
                keylist.append(key)
                if len(keylist) > maxsize:
                    try:
                        del cache[keylist[0]]
                        del keylist[0]
                    except KeyError:
                        pass
                return result
    
            return functools.update_wrapper(wrapper, user_function)
    
        return decorating_function
    

    In my first version, I had omitted the copy.copy() function (which should really be copy.deepcopy()), which led to bugs if I changed the resulting value and then recalled the cached-function. After I added the copy.copy() functionality, I realised that I was hogging memory for some cases, primarily because my function counts objects, not total memory usage, which is non-trivial to do in general in Python (although should be easy if limited to NumPy arrays). Therefore I added the NO_CACHE and CLEAR_CACHE keywords to the resulting function, which do what their names suggest.

    After writing and using this function, I understand there is more than one good reason for functools.lru_cache to work only for functions with hashable input arguments. Anyone needing a caching function that works with mutable arguments needs to be very careful.