Search code examples
pythonpython-multiprocessing

Can you leave Pool open across multiple different Map functions within Python's Multiprocessing Pool?


I'm wondering if you can leave a multiprocessing.Pool open across multiple different map functions? If this is possible, are there any pitfalls with such an approach?

My general use case would be to assign a pool to a class variable, such as self.pool, and then call this self.pool across various different map functions within the class - e.g., self.pool.map(func, args). My goal is to minimize the overhead of closing down and then restarting each pool of workers, such that I just keep them open indefinitely and pass them self.pool.map jobs as I need it.

One potential pitfall I can see would be that I would need to remember to close the self.pool within the class once I'm done using it.


Solution

  • Yes, you can use the pool many times. Your design is a good one. Sure, you have to close, but the enclosing class could itself have a def close(self): function that does that, and make that a requirement of using the class. You could even make that class a context manager if you want to. The need to catch exceptions and close things in a finally block is standard fare for python programming.

    As an example,

    import multiprocessing as mp
    import threading
    
    class MyClassWithPool:
    
        def __init__(self, workers=None):
            self._mp_pool_lock = threading.Lock()
            self._pool = None
            if workers is None:
                self._pool_count = min(2, int(mp.cpu_count() * .50))
            else:
                self._pool_count = workers
            # initialize your other stuff...
    
        @property
        def mp_pool(self):
            with self._mp_pool_lock:
                if self._pool is None:
                    self._pool = mp.Pool(self._pool_count)
            return self._pool
    
        def close(self):
            with self._mp_pool_lock:
                if self._pool:
                    self._pool.close()
                    self._pool = None
            # ... any other cleanup ...
    
        def __del__(self):
            self.close()
        
    def do_some_stuff(i):
        return i
        
    def do_other_stuff(i):
        return i
    
    def main():
        my_data = MyClassWithPool()
        try:
            result_1 = my_data.mp_pool.map(do_some_stuff, range(5))
            print(result_1)
            result_2 = my_data.mp_pool.map(do_other_stuff, range(99))
            print(result_2)
        finally:
            my_data.close()
    
    if __name__ == "__main__":
        main()