I am trying to use klepto to do LRU caching. I would like to store the cache to disk, and am currently using klepto's dir_archive
option for this. I have written the following code, largely based on the code in the klepto test scripts:
def mymap(data):
return hashlib.sha256(data).hexdigest()
class MyLRUCache:
@lru_cache(cache=dir_archive(cached=False), keymap=mymap, ignore='self', maxsize=5)
def __call__(self, data)
return data
call = __call__
def store(self, data):
self.call(data)
# I would also appreciate a better way to do this, if possible.
def lookup(self, key):
return self.call.__cache__()[key]
This code appears to work fine until the cache reaches maxsize
. At that point, instead of using LRU to remove a single item, lru_cache
purges the entire cache! Below is the piece of klepto source code that does this (https://github.com/uqfoundation/klepto/blob/master/klepto/safe.py):
# purge cache
if _len(cache) > maxsize:
if cache.archived():
cache.dump()
cache.clear()
queue.clear()
refcount.clear()
else: # purge least recently used cache entry
key = queue_popleft()
refcount[key] -= 1
while refcount[key]:
key = queue_popleft()
refcount[key] -= 1
del cache[key], refcount[key]
So my question is, why does klepto purge "archived" caches? Is it possible to use lru_cache
and dir_archive
together?
Also, if my code looks completely nuts, I would really appreciate some sample code of how I should be writing this, since there was not much documentation for klepto.
ADDITIONAL NOTES:
I also tried defining dir_archive
with cached=True
. The in-memory cache still gets purged when maxsize
is reached, but the contents of the cache are dumped to the archived cache at that point. I have several problems with this:
maxsize
is reached, at which point it is wiped.maxsize
. Every time maxsize
is reached by the in-memory cache, all items in the in-memory cache are dumped to the archived cache, regardless of how many are already there.The answer is that you couldn't before your question, but now you can.
If you get the most recent klepto
from github, and provide the new flag
purge=False
-- then you get the behavior you are looking for. I just added this in response to your question.
In your case:
lru_cache(cache=dir_archive(cached=False), keymap=mymap, ignore='self', maxsize=5, purge=False)
Or, for example:
@lru_cache(maxsize=3, cache=dict_archive('test'), purge=True)
def identity(x):
return x
identity(1)
identity(2)
identity(3)
ic = identity.__cache__()
assert len(ic.keys()) == 3
assert len(ic.archive.keys()) == 0
identity(4)
assert len(ic.keys()) == 0
assert len(ic.archive.keys()) == 4
identity(5)
assert len(ic.keys()) == 1
assert len(ic.archive.keys()) == 4
@lru_cache(maxsize=3, cache=dict_archive('test'), purge=False)
def inverse(x):
return -x
inverse(1)
inverse(2)
inverse(3)
ic = inverse.__cache__()
assert len(ic.keys()) == 3
assert len(ic.archive.keys()) == 0
inverse(4)
assert len(ic.keys()) == 3
assert len(ic.archive.keys()) == 1
inverse(5)
assert len(ic.keys()) == 3
assert len(ic.archive.keys()) == 2
Please add a ticket if this doesn't do what you were expecting. Thanks for the suggestion.