I am trying to create a script where it calls a function from a seperate module to do parallel processing.
My 'top-level' script looks like this:
from hydrology import model, descriptors
if __name__ == "__main__":
datafile = r"C:\folder\datafile.shp"
myModel = model.Model(data = datafile)
res = descriptors.watershed_pll(myModel)
The descriptors
module looks like this:
from multiprocessing import Pool
from arcfunc import multi_watershed
def watershed_pll(model):
pool = Pool()
for key, val in model.stations.iteritems():
res = pool.apply_async(multi_watershed(val, key))
pool.close()
pool.join()
return res
As you can see, the function to run in parallel is imported from the module arcfunc
,
the function carrying out the parallelisation is inside the module descriptors
and the script running everything is seperate again.
There are no exceptions when I run, but I have two problems:
I suspect that my architecture is complicating things, however, it is important that the parallelisation function is in a separate module.
Any suggestions?
Instead of passing the function and argument to apply_async
, the code calls multi_watershed
directly (in main process), and pass the return value of the function.
Pass the function and arguments.
Replace following line:
res = pool.apply_async(multi_watershed(val, key))
with:
res = pool.apply_async(multi_watershed, (val, key))