Search code examples
pythonmultiprocessingenumerate

List comprehension syntax for pool.startmap_async()


Looking at an example here:

https://www.machinelearningplus.com/python/parallel-processing-python/

There is a function definition which is to be parallelised:

# Step 1: Redefine, to accept `i`, the iteration number
def howmany_within_range2(i, row, minimum, maximum):
    """Returns how many numbers lie within `maximum` and `minimum` in a given `row`"""
    count = 0
    for n in row:
        if minimum <= n <= maximum:
            count = count + 1
    return (i, count)

The starmap_async example is give as below:

results = pool.starmap_async(howmany_within_range2, [(i, row, 4, 8) for i, row in enumerate(data)]).get()

I am a bit confused by this syntax here, particularly the "i" parameter and how this enumerate syntax works.

Also the apply_asyncy() example uses a pool.join() statement, but the map_async() statement doesn't use one?


Solution

  • Breaking this down a little,

    data = [
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9],
    ]
    
    arguments = [(i, row, 4, 8) for i, row in enumerate(data)]
    
    print(arguments)
    

    outputs (formatted)

    [
      (0, [1, 2, 3], 4, 8),
      (1, [4, 5, 6], 4, 8),
      (2, [7, 8, 9], 4, 8),
    ]
    

    which are the tuples howmany_within_range2 will be executed with, i.e.

    howmany_within_range2(0, [1, 2, 3], 4, 8)
    howmany_within_range2(1, [4, 5, 6], 4, 8)
    howmany_within_range2(2, [7, 8, 9], 4, 8)
    

    but in parallel.

    enumerate is used here to easily access the row index of the row in the data list; otherwise you'd just get a bunch of results without an easy way to associate them with the original data rows.