Looking at an example here:
https://www.machinelearningplus.com/python/parallel-processing-python/
There is a function definition which is to be parallelised:
# Step 1: Redefine, to accept `i`, the iteration number
def howmany_within_range2(i, row, minimum, maximum):
"""Returns how many numbers lie within `maximum` and `minimum` in a given `row`"""
count = 0
for n in row:
if minimum <= n <= maximum:
count = count + 1
return (i, count)
The starmap_async example is give as below:
results = pool.starmap_async(howmany_within_range2, [(i, row, 4, 8) for i, row in enumerate(data)]).get()
I am a bit confused by this syntax here, particularly the "i" parameter and how this enumerate syntax works.
Also the apply_asyncy() example uses a pool.join() statement, but the map_async() statement doesn't use one?
Breaking this down a little,
data = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
]
arguments = [(i, row, 4, 8) for i, row in enumerate(data)]
print(arguments)
outputs (formatted)
[
(0, [1, 2, 3], 4, 8),
(1, [4, 5, 6], 4, 8),
(2, [7, 8, 9], 4, 8),
]
which are the tuples howmany_within_range2
will be executed with, i.e.
howmany_within_range2(0, [1, 2, 3], 4, 8)
howmany_within_range2(1, [4, 5, 6], 4, 8)
howmany_within_range2(2, [7, 8, 9], 4, 8)
but in parallel.
enumerate
is used here to easily access the row index of the row in the data
list; otherwise you'd just get a bunch of results without an easy way to associate them with the original data rows.