Search code examples
pythonmemoizationfunctools

Caching Python function results using only subset of arguments as identifier


Is there an easy way to cache function results in python based on a single identifier argument? For example, suppose my function has 3 arguments arg1, arg2 and id. Is there a simple way to cache the function result based only on the value of id? That is, whenever id takes the same value, the cached function would return the same result, regardless of arg1 and arg2.

Background: I have a time-consuming and repeatedly called function, in which arg1 and arg2 are lists and dictionaries composed of large numpy arrays. Hence, functools.lru_cache doesn't work as is. Yet, there are only a handful specific combinations of arg1 and arg2. Hence my idea to manually specify some id which takes a unique value for each possible combination of arg1 and arg2.


Solution

  • def cache(fun):
        cache.cache_ = {}
        def inner(arg1, arg2, id):
            if id not in cache.cache_:
                print(f'Caching {id}') # to check when it is cached
                cache.cache_[id] = fun(arg1, arg2, id)
            return cache.cache_[id]
        return inner
        
    @cache
    def function(arg1, arg2, arg3):
        print('something')
    

    You can create your own decorator as suggested by DarrylG. You can do a print(cache.cache_) inside if id not in cache.cache_: to check that it only caches for newer values of id.

    You can make cache_ a function attribute PEP 232 by using cache.cache_. Then when you want to reset cache_ you can use cache.cache_.clear(). That will give you direct access to the dictionary that caches the results.

    function(1, 2, 'a')
    function(11, 22, 'b')
    function(11, 22, 'a')
    function([111, 11], 222, 'a')
    
    print(f'Cache {cache.cache_}') # view previously cached results
    cache.cache_.clear() # clear cache
    print(f'Cache {cache.cache_}') # cache is now empty
    
    # call some function again to populate cache
    function(1, 2, 'a')
    function(11, 22, 'b')
    function(11, 22, 'a')
    function([111, 11], 222, 'a')
    

    Edit: Addressing a new comment by @Bob (OP), in most cases returning a reference to the same object would suffice but OP's use-case seems to require a new copy of the answer, possibly due to the nature of function(arg1, arg2, arg3) being treated as unique based on arg1, arg_2 and arg3 (inside the "cache" function uniqueness is only defined using id). In which case, returning the same reference to a mutable object would lead to undesired behavior. As mentioned in the same comment, the return statement in the inner function should be changed from return cache.cache_[id] to return copy.deepcopy(cache.cache_[id]).