Search code examples
pythonparallel-processingmultiprocessingpython-multiprocessing

Parallelization of a simple iteration function in Python


I was trying to use parallelization using mutiprocessing to achieve good execution time in an iteration function but I was unable to do that.

My function is this

    for i in range(len(self.set)):
        degree = self.__degree__(self.set[i])
        self.subsets.append(degree)

I call it using

self.__build__()

--

I tried to make a parallel version like that:

    def __buildParallel__(self, i):
        degree = self.__degree__(self.set[i])
        self.subsets[i].append(degree)

I call the parallel version using

    import multiprocessing as mp
    pool = mp.Pool()  
    pool.map(self.__buildParallel__, self.set)

Example:

I want to parallelize the build function

import math
class Testing(object):
    def __init__(self, inputSet):
        self.set = inputSet
        self.subsets = []
        self.__build__()

    def __build__(self):
        for i in range(len(self.set)):
            degree = self.__degree__(self.set[i])
            self.subsets.append(degree)
                               
    def __degree__(self, element):
        return element[0] * 0.01

anObject = Testing([[1],[2],[3],[4],[5]])

--

Attempted parallel version

import math
import multiprocessing as mp

class Testing(object):
    def __init__(self, inputSet):
        self.set = inputSet
        self.subsets = []
        pool = mp.Pool()  
        pool.map(self.__buildParallel__, self.set)

    def __buildParallel__(self, i):
        degree = self.__degree__(self.set[i])
        self.subsets[i].append(degree)
                               
    def __degree__(self, element):
        return element[0] * 0.01

anObject = Testing([[1],[2],[3],[4],[5]])

What happens is that the program simply does not run and all processors reach 100% usage. What could be happening?

thanks


Solution

  • Any modifications made to a class in a process do so in the context of that process and won't change the original class. With pool.map, the return values of the mapped function will be collected and returned.

    Note that there is overhead in creating the pool processes so this trivial example would normally run slower in parallel, but I've added a sleep to appear that the parallel work takes some time.

    import math
    import multiprocessing as mp
    import time
    
    class Testing(object):
        def __init__(self, inputSet):
            self.set = inputSet
            with mp.Pool() as pool:
                self.subsets = pool.map(self.__buildParallel__,self.set)
    
        def __buildParallel__(self, i): # i is just one element, not whole set
            time.sleep(5)  # build time would be 25+ seconds if not parallel
            return self.__degree__(i)
                                   
        def __degree__(self, element):
            return element[0] * 0.01
    
    if __name__ == '__main__':
        anObject = Testing([[1],[2],[3],[4],[5]])
        print(anObject.set)
        print(anObject.subsets)
    

    Result (after 5 seconds):

    [[1], [2], [3], [4], [5]]
    [0.01, 0.02, 0.03, 0.04, 0.05]