python process multiprocessing process-pool

how to properly start parallel executing of two functions over multiple arguments?

I am looking for a way to start two functions in parallel, each executing over a given set of different arguments. I use pool.map to achieve this. I create two different processes and each process starts a pool executing the map. This works - the order of execution is a little bit wild, but I will save this for another question.

Now I have also found another approach here (see the first answer). They are using only one pool and call map_async two times in a row. So I was wondering, if there is a preferred way for doing this? Because I have read (sadly I don't remember where) that it is best to only use one pool, which would mean the second approach (using only one pool) is better. But when I measure the time, the first approch (using two pools in separate processes) is actually a little bit faster. Additionally in the first approach the functions are really running in parallel, whereas in the second approach first the first call of map_async executes, then the second call.

Here is my test code:

from multiprocessing import Process, Pool
import time
import os

multiple_pools = True
data = list(range(1, 11))


def func_a(param):
    print(f'running func_a in process {os.getpid()}')
    print(f'passed argument: {param}')
    print('calculating...\n')
    time.sleep(1.5)
    print('done\n')

def func_b(param):
    print(f'running func_b in process {os.getpid()}')
    print(f'passed argument: {param}')
    print('calculating...\n')
    time.sleep(2.5)
    print('done\n')

def execute_func(func, param):
    p = Pool(processes=8)
    with p:
        p.map(func, param)


if __name__ == '__main__':
    if not multiple_pools:
        t0 = time.time()
        p = Pool(processes=8)

        res = p.map_async(func_a, data)
        res = p.map_async(func_b, data)

        p.close()
        p.join()

        t1 = time.time()
        dt = t1 -t0
        print(f'time spent with one pool: {dt} s')

    else:
        t0 = time.time()
        p1 = Process(target=execute_func, args=(func_a, data))
        p2 = Process(target=execute_func, args=(func_b, data))

        p1.start()
        p2.start()

        p1.join()
        p2.join()
        p1.close()
        p2.close()

        t1=time.time()
        dt = t1 -t0
        print(f'time spent with two pools, each inside an own process: {dt} s')

So again, my question: is there one way preferred over the other? Or maybe even other/better ways to do this?

Solution

First of all, I am assuming when you use two pools that you will be using the non-blocking map_async method. I would say that two pools of size N each where you were submitting M tasks to each pool where all the tasks are identical (i.e. require the same resources as far as CPU, I/O, etc. are concerned) should be more-or-less equivalent execution time-wise as submitting the same 2 * M tasks to a single pool of size 2 * N.

The following program demonstrates the two cases:

from multiprocessing import Pool
import time

QUARTER_SECOND_ITERATIONS = 5_000_000

def quarter_second(x):
    sum = 0
    for _ in range(QUARTER_SECOND_ITERATIONS):
        sum += 1
    return x * x

def callback(result):
    global callback_count
    print('Two pools result:', result)
    callback_count += 1
    if callback_count == 2:
        # Both map-async calls have completed:
        print('Two pools time:', time.time() - start_time)

# required for Windows:
if __name__ == '__main__':
    data1 = range(10)
    data2 = range(10, 20)
    # Two pools:
    pool1 = Pool(4)
    pool2 = Pool(4)
    callback_count = 0
    start_time = time.time()
    pool1.map_async(quarter_second, data1, callback=callback)
    pool2.map_async(quarter_second, data2, callback=callback)
    pool1.close()
    pool1.join()
    pool2.close()
    pool2.join()

    # One Pool:
    data = range(20)
    pool = Pool(8)
    start_time = time.time()
    result = pool.map(quarter_second, data)
    print('One pool result:', result)
    print('One pool time:', time.time() - start_time)
    pool.close()
    pool.join()

Prints:

Two pools result: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Two pools result: [100, 121, 144, 169, 196, 225, 256, 289, 324, 361]
Two pools time: 1.4994373321533203
One pool result: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361]
One pool time: 1.4596436023712158

I reran this several times and most but not all times the one-pool case did slightly better. But I have many other processes running on my desktop that affect the results. I did not include in the total time the actual time to create the processing pool(s). Also, the the map functions, depending on the size of the pools and iterable arguments, could compute a slightly different chunksize value to use when no explicit chunksize argument is specified as is the case here. But that would have a negligible performance effect. All in all, I can't really see any significant performance difference between the one-pool and two-pool approach given my assumptions.