Search code examples
pythonpython-multiprocessingconcurrent.futuresprocess-pool

MultiProcess function with multiple arguments


I'm diving into the multiprocessing world in python.

After watching some videos I came up with a question due to the nature of my function.

This function takes 4 arguments:

  1. The 1st argument is a file to be read, hence, this is a list of files to read.
  2. The following 2 arguments are two different dictionaries.
  3. The last argument is an optional argument "debug_mode" which is needed to be set to "True"
# process_data(file, signals_dict, parameter_dict, debug_mode=False)
file_list = [...]
t1 = time.time()
with concurrent.futures.ProcessPoolExecutor() as executor:
    executor.map(process_data, file_list)
t2 = time.time()

The question is: How can I specify the remaining parameters to the function?

Thanks in advance


Solution

  • ProcessPoolExecutor.map documentation is weak. The worker accepts a single parameter. If your target has a different call signature, you need to write an intermediate worker that is passed a container and knows how to expand that into the paramter list. The documention also fails to make it clear that you need to wait for the job to complete before closing the pool. If you start the jobs and exit the pool context with clause, the pool is terminated.

    import concurrent.futures
    import os
    
    def process_data(a,b,c,d):
        print(os.getpid(), a, b, c, d)
        return a
    
    def _process_data_worker(p):
        return process_data(*p)
    
    if __name__ == "__main__":
        file_list = [["fooa", "foob", "fooc", "food"],
            ["bara", "barb", "barc", "bard"]]
    
        with concurrent.futures.ProcessPoolExecutor() as executor:
            results = executor.map(_process_data_worker, file_list)
    
    for result in results:
        print('result', result)