Search code examples
pythonmultiprocessingoverhead

Python, multiprocessing with classes


I implemented small statistical functions, and parallelized by multiprocessing. Overall structure of the code looks like this:

def worker(args, no):
    f = Stat.fit(args)
    return f.result

class Stat:
    def fit(self):
        doing various things...

    def bootstrap(self):
        p = mp.Pool(mp.cpu_count())
        parameter = ... #set parameters for Stat
        worker = functools.partial(worker, parameter)

        for i, _ in enumerate(p.imap_unordered(worker, range(1000))):
            pass

So, bootstrap method in Stat class invoke processes that runs function which create instance of Stat class and run fit() methods. I guess this approach is maybe quite inefficient. Is it better to replace class with functions? Or using classes like this doesn't affect multiprocessing performances?


Solution

  • It's not inefficient (it won't affect performance), it's just unorthodox. It would probably be a little more cleaner if you took bootstrap out of Stat since it doesn't look like it benefits from being a method of that class.

    def worker(args, no):
        f = Stat.fit(args)
        return f.result
    
    def bootstrap(self):
        p = mp.Pool(mp.cpu_count())
        parameter = ... #set parameters for Stat
        worker = functools.partial(worker, parameter)
    
        for i, _ in enumerate(p.imap_unordered(worker, range(1000))):
            pass
    
    class Stat:
        def fit(self):
            doing various things...