List comprehension syntax for pool.startmap_async()

Looking at an example here:

https://www.machinelearningplus.com/python/parallel-processing-python/

There is a function definition which is to be parallelised:

# Step 1: Redefine, to accept `i`, the iteration number
def howmany_within_range2(i, row, minimum, maximum):
    """Returns how many numbers lie within `maximum` and `minimum` in a given `row`"""
    count = 0
    for n in row:
        if minimum <= n <= maximum:
            count = count + 1
    return (i, count)

The starmap_async example is give as below:

results = pool.starmap_async(howmany_within_range2, [(i, row, 4, 8) for i, row in enumerate(data)]).get()

I am a bit confused by this syntax here, particularly the "i" parameter and how this enumerate syntax works.

Also the apply_asyncy() example uses a pool.join() statement, but the map_async() statement doesn't use one?

Solution

Breaking this down a little,

data = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
]

arguments = [(i, row, 4, 8) for i, row in enumerate(data)]

print(arguments)

outputs (formatted)

[
  (0, [1, 2, 3], 4, 8),
  (1, [4, 5, 6], 4, 8),
  (2, [7, 8, 9], 4, 8),
]

which are the tuples howmany_within_range2 will be executed with, i.e.

howmany_within_range2(0, [1, 2, 3], 4, 8)
howmany_within_range2(1, [4, 5, 6], 4, 8)
howmany_within_range2(2, [7, 8, 9], 4, 8)

but in parallel.

enumerate is used here to easily access the row index of the row in the data list; otherwise you'd just get a bunch of results without an easy way to associate them with the original data rows.