Search code examples
pythonmultiprocessinggeopandas

Multiple inputs and outputs, parallelize it using what?


Pool.map() accepts only one iterable as an argument, that's not my case, and I find it difficult to reduce it to a single iterable variable.

mp.Process() only allows me one variable output, which is not my case either, my outputs are 4 list of geodataframe which is created in the parallelization

with what function (in multiprocessing) can I parallelize it?


Solution

  • You can zip together the multiple iterable arguments and call Pool.map with that result as the single iterable argument in which case the func argument to map then will be a function that takes a tuple as its argument or you can call pool.starmap in which case the func argument to starmap will be a function that takes n arguments.

    This demonstrates the two techniques:

    from multiprocessing import Pool
    
    def worker_1(t):
        x, y, z = t # unpack
        ...
    
    def worker_2(x, y, z):
        ...
    
    if __name__ == '__main__':
        pool = Pool()
        results_1 = pool.map(worker_1, zip(range(1, 101), range(2, 102), range(3, 103)))
        results_2 = pool.starmap(worker_2, zip(range(1, 101), range(2, 102), range(3, 103)))