I'm wondering if you can leave a multiprocessing.Pool
open across multiple different map
functions? If this is possible, are there any pitfalls with such an approach?
My general use case would be to assign a pool to a class variable, such as self.pool
, and then call this self.pool
across various different map
functions within the class - e.g., self.pool.map(func, args)
. My goal is to minimize the overhead of closing down and then restarting each pool
of workers, such that I just keep them open indefinitely and pass them self.pool.map
jobs as I need it.
One potential pitfall I can see would be that I would need to remember to close the self.pool
within the class once I'm done using it.
Yes, you can use the pool many times. Your design is a good one. Sure, you have to close, but the enclosing class could itself have a def close(self):
function that does that, and make that a requirement of using the class. You could even make that class a context manager if you want to. The need to catch exceptions and close things in a finally block is standard fare for python programming.
As an example,
import multiprocessing as mp
import threading
class MyClassWithPool:
def __init__(self, workers=None):
self._mp_pool_lock = threading.Lock()
self._pool = None
if workers is None:
self._pool_count = min(2, int(mp.cpu_count() * .50))
else:
self._pool_count = workers
# initialize your other stuff...
@property
def mp_pool(self):
with self._mp_pool_lock:
if self._pool is None:
self._pool = mp.Pool(self._pool_count)
return self._pool
def close(self):
with self._mp_pool_lock:
if self._pool:
self._pool.close()
self._pool = None
# ... any other cleanup ...
def __del__(self):
self.close()
def do_some_stuff(i):
return i
def do_other_stuff(i):
return i
def main():
my_data = MyClassWithPool()
try:
result_1 = my_data.mp_pool.map(do_some_stuff, range(5))
print(result_1)
result_2 = my_data.mp_pool.map(do_other_stuff, range(99))
print(result_2)
finally:
my_data.close()
if __name__ == "__main__":
main()