Search code examples
python-3.xcachingpython-multiprocessingmemoizationprocess-pool

How to properly memoize when using a ProcessPoolExecutor?


I suspect that something like:

@memoize
def foo():
    return something_expensive

def main():
    with ProcessPoolExecutor(10) as pool:
        futures = {pool.submit(foo, arg): arg for arg in args}
        for future in concurrent.futures.as_completed(futures):
            arg = futures[future]
            try:
                result = future.result()
            except Exception as e:
                sys.stderr.write("Failed to run foo() on {}\nGot {}\n".format(arg, e))
            else:
                print(result)

Won't work (assuming @memoize is a typical dict-based cache) due to the fact that I am using a multi-processing pool and the processes don't share much. At least it doesn't seem to work.

What is the correct way to memoize in this scenario? Ultimately I'd also like to pickle the cache to disk and load it on subsequent runs.


Solution

  • You can use a Manager.dict from multiprocessing which uses a Manager to proxy between processes and store in a shared dict, which can be pickled. I decided to use Multithreading though because it's an IO bound app and thread shared memory space means I dont need all that manager stuff, I can just use a dict.