Search code examples
pythonpython-3.5python-multiprocessingpoolobject-initializers

Optimizing multiprocessing.Pool with expensive initialization


Here is a complete simple working example

import multiprocessing as mp
import time
import random


class Foo:
    def __init__(self):
        # some expensive set up function in the real code
        self.x = 2
        print('initializing')

    def run(self, y):
        time.sleep(random.random() / 10.)
        return self.x + y


def f(y):
    foo = Foo()
    return foo.run(y)


def main():
    pool = mp.Pool(4)
    for result in pool.map(f, range(10)):
        print(result)
    pool.close()
    pool.join()


if __name__ == '__main__':
    main()

How can I modify it so Foo is only initialized once by each worker, not every task? Basically I want the init called 4 times, not 10. I am using python 3.5


Solution

  • The intended way to deal with things like this is via the optional initializer and initargs arguments to the Pool() constructor. They exist precisely to give you a way to do stuff exactly once when a worker process is created. So, e.g., add:

    def init():
        global foo
        foo = Foo()
    

    and change the Pool creation to:

    pool = mp.Pool(4, initializer=init)
    

    If you needed to pass arguments to your per-process initialization function, then you'd also add an appropriate initargs=... argument.

    Note: of course you should also remove the

    foo = Foo()
    

    line from f(), so that your function uses the global foo created by init().