I was trying to use parallelization using mutiprocessing to achieve good execution time in an iteration function but I was unable to do that.
My function is this
for i in range(len(self.set)):
degree = self.__degree__(self.set[i])
self.subsets.append(degree)
I call it using
self.__build__()
--
I tried to make a parallel version like that:
def __buildParallel__(self, i):
degree = self.__degree__(self.set[i])
self.subsets[i].append(degree)
I call the parallel version using
import multiprocessing as mp
pool = mp.Pool()
pool.map(self.__buildParallel__, self.set)
Example:
I want to parallelize the build function
import math
class Testing(object):
def __init__(self, inputSet):
self.set = inputSet
self.subsets = []
self.__build__()
def __build__(self):
for i in range(len(self.set)):
degree = self.__degree__(self.set[i])
self.subsets.append(degree)
def __degree__(self, element):
return element[0] * 0.01
anObject = Testing([[1],[2],[3],[4],[5]])
--
Attempted parallel version
import math
import multiprocessing as mp
class Testing(object):
def __init__(self, inputSet):
self.set = inputSet
self.subsets = []
pool = mp.Pool()
pool.map(self.__buildParallel__, self.set)
def __buildParallel__(self, i):
degree = self.__degree__(self.set[i])
self.subsets[i].append(degree)
def __degree__(self, element):
return element[0] * 0.01
anObject = Testing([[1],[2],[3],[4],[5]])
What happens is that the program simply does not run and all processors reach 100% usage. What could be happening?
thanks
Any modifications made to a class in a process do so in the context of that process and won't change the original class. With pool.map
, the return values of the mapped function will be collected and returned.
Note that there is overhead in creating the pool processes so this trivial example would normally run slower in parallel, but I've added a sleep to appear that the parallel work takes some time.
import math
import multiprocessing as mp
import time
class Testing(object):
def __init__(self, inputSet):
self.set = inputSet
with mp.Pool() as pool:
self.subsets = pool.map(self.__buildParallel__,self.set)
def __buildParallel__(self, i): # i is just one element, not whole set
time.sleep(5) # build time would be 25+ seconds if not parallel
return self.__degree__(i)
def __degree__(self, element):
return element[0] * 0.01
if __name__ == '__main__':
anObject = Testing([[1],[2],[3],[4],[5]])
print(anObject.set)
print(anObject.subsets)
Result (after 5 seconds):
[[1], [2], [3], [4], [5]]
[0.01, 0.02, 0.03, 0.04, 0.05]