Search code examples
pythonmultiprocessing

Python Multiprocessing - is this method a valid way to configure and access process values in a safe way?


Is the below strategy a satisfactory way to spawn several new processes with their own multiprocessing value variable for later access in a process safe manner? Can each of the processes write to the process_outputs list independently and asynchronously? It seems to work but I have not seen this method used.

def process_function(output):
  do something
  output.value = something

if __name__ == '__main__':
  processes = []
  process_outputs=[]
  for i in range(num_processes):
      process_outputs.append(mp.Value('l', 0))
      processes.append(mp.Process(target=process_function, args=(process_outputs[i])))
  for process in processes:
      process.start()
  for process in processes:
      process.join()
  for j in range(num_processes):
      if process_outputs[j].value == something:
          pass  # do something
      else:
          pass  # do something else

Solution

  • It looks like you just want to collect return values of functions that run in parallel. That can be done much simpler:

    import multiprocessing
    
    
    def process_function(n):
        return n ** 2  # just computing a square for example
    
    
    if __name__ == '__main__':
        data = [1, 2, 3, 4, 5]
    
        pool = multiprocessing.Pool()
        results = pool.map(process_function, data)
        pool.close()
        pool.join()
    
        print(results)
    

    If you somehow need access to the results as they become available, instead of waiting for all the functions to complete:

    import multiprocessing
    from random import randint
    from time import sleep, time
    
    
    def process_function(n):
        start = time()
        sleep(d := randint(1, 5))
        return n, n ** 2, d, start  # a tuple of n, its square, delay and start time
    
    
    if __name__ == '__main__':
        data = [1, 2, 3, 4, 5]
    
        pool = multiprocessing.Pool()
        for result in pool.imap_unordered(process_function, data):
            print(result)
    
        pool.close()
        pool.join()
    

    Example output:

    (2, 4, 1, 1688715880.890775)
    (1, 1, 1, 1688715880.888705)
    (3, 9, 3, 1688715880.9089909)
    (5, 25, 3, 1688715880.9211953)
    (4, 16, 5, 1688715880.9120545)
    

    This shows that the inputs are processed in parallel and that the results are processed as soon as they become available. The timestamp shows that all calls were started within moments of each other, the delay shows that they indeed complete in the expected order.

    This example doesn't act differently depending on the returned result, it just prints all of them, but it's of course trivial to replace the call to print() with something more clever.

    Edit: keep in mind that this is just an example, kept brief for simplicity. As pointed out in the comments, you'll want to ensure your code doesn't hang if there's the possibility of exceptions in the called function (hardly the case here) by using a context manager like this (also obviating the need to join and close the pool):

        with multiprocessing.Pool() as pool:
            for result in pool.imap_unordered(process_function, data):
                print(result)
    

    There's more things to add and say here, but that goes well beyond the context of the question.