I am trying to use multiprocessing on a function in my script and I think it is an issue with my structure and syntax for multiprocessing. The goal here is to loop through a list and copy out raster/tiff images from thirteen different spatial geodatabases. Both the outputs and inputs are saved in individual geodatabases to speed up the process and prevent crashing. Here is my sample script:
import arcpy, time, os
from arcpy import env
import multiprocessing
from multiprocessing import pool
def copy_rasters_3b(P6_GDBs, x, c):
mosaic_gdb = os.path.join('{}/mosaic_{}.gdb/{}'.format(P6_GDBs, x, c))
final_gdb = os.path.join('{}/final_{}.gdb/{}'.format(P6_GDBs, x, c))
final_tiff = os.path.join('{}/{}.tif'.format(P6_GDBs, c))
print "---Copying Rasters Started---"
start_time = time.time()
arcpy.CopyRaster_management(mosaic_gdb, final_gdb, "", "", "", "NONE", "NONE", "8_BIT_UNSIGNED", "NONE", "NONE", "", "NONE")
arcpy.CopyRaster_management(mosaic_gdb, final_tiff, "", "", "", "NONE", "NONE", "8_BIT_UNSIGNED", "NONE", "NONE", "TIFF", "NONE")
print("--- "+ c + " Complete %s seconds ---" % (time.time() - start_time))
### Main ###
def main():
P6_DIR= "D:/P6_SecondRun"
P6_GDBs= "D:/P6_GDBs"
Categories =['CRP', 'FORE', 'INR', 'IR', 'MO', 'PAS', 'TCI', 'TCT', 'TG', 'WAT', 'WLF', 'WLO', 'WLT']
rasters = defaultdict(list)
# Environments
arcpy.env.overwriteOutput = True
arcpy.env.snapRaster = "D:/A__Snap/Phase6_Snap.tif"
arcpy.env.outputCoordinateSystem = arcpy.SpatialReference(102039) #arcpy.SpatialReference(3857)
pool = multiprocessing.Pool(processes=5)
for c, x in zip(Categories, range(1,14)):
pool.map(copy_rasters_3b(P6_GDBs, x, c), 14)
pool.close()
############### EXECUTE MAIN() ###############
if __name__ == "__main__":
main()
This script initiates five processes in the background but ultimately fails after first two instances. I'm running ArcMap 10.4 x64 and working with Python27-64 bit.
First, you should be invoking pool.map
(once) with a function and an iterable that provides arguments for the function. The reason for this is so that the actual function call is made in the sub-processes. Instead you are actually calling the function directly and passing its result to pool.map
which isn't useful.
Second, you're calling pool.close
inside the loop, then calling pool.map
again in the next iteration. The pattern with the pool object is to submit all the jobs to it. Then call pool.close
when you're done to wait for all the jobs to complete. In this case, all the jobs will be given to it with a single .map
method invocation.
Your code could look something like this:
def copy_rasters_3b(arg_tuple):
P6_GDBs, x, c = arg_tuple # Break apart argument tuple
... # Rest same as before
pool = multiprocessing.Pool(processes=5)
pool.map(copy_rasters_3b, [(P6_GDBs, x, c) for x, c in enumerate(Categories)])
pool.close()
Breaking the list comprehension down: Categories
is a list of strings, so enumerate(Categories)
produces a numbered set of tuples (N, member)
where N is the position in the list and member is the value from the list. We're combining those two with the constant value P6_GDBs in each result tuple. The final result of the comprehension then is a list of 3-tuples.
map
will provide a single element from its iterable argument to the function on each invocation. However, that single element is the 3-tuple so the function here needs to break apart the one argument into its constituent parts.
Finally you should drop the following import line as it's confusing and unnecessary. Here you import the name pool
but then later (without ever using it), you overlay it with a new value by assigning to pool
as a variable name:
from multiprocessing import pool
[...]
pool = multiprocessing.Pool(processes=5)