Search code examples
pythonperformancequeuemultiprocessingpipe

Multiprocessing - Pipe vs Queue


What are the fundamental differences between queues and pipes in Python's multiprocessing package?

In what scenarios should one choose one over the other? When is it advantageous to use Pipe()? When is it advantageous to use Queue()?


Solution

  • What are the fundamental differences between queues and pipes in Python's multiprocessing package?

    Major Edit of this answer (CY2024): concurrency

    As of modern python versions if you don't need your producers and consumers to communicate, that's the only real use-case for python multiprocessing.

    If you only need python concurrency, use concurrent.futures.

    This example uses concurrent.futures to make four calls to do_something_slow(), which has a one-second delay. If your machine has at least four cores, running this four-second-aggregate series of function calls only takes one-second.

    By default, concurrent.futures spawns workers corresponding to the number of CPU cores you have.

    import concurrent.futures
    import time
    
    def do_slow_thing(input_str: str) -> str:
        """Return modified input string after a 1-second delay"""
        if isinstance(input_str, str):
            time.sleep(1)
            return "1-SECOND-DELAY " + input_str
        else:
            return "INPUT ERROR"
    
    if __name__=="__main__":
    
        # Define some inputs for process pool
        all_inputs = [
            "do",
            "foo",
            "moo",
            "chew",
        ]
    
        # Spawn a process pool with the default number of workers...
        with concurrent.futures.ProcessPoolExecutor(max_workers=None) as executor:
            # For each string in all_inputs, call do_slow_thing() 
            #    in parallel across the process worker pool
            these_futures = [executor.submit(do_slow_thing, ii) for ii in all_inputs]
            # Wait for all processes to finish
            concurrent.futures.wait(these_futures)
    
        # Get the results from the process pool execution... each
        # future.result() call is the return value from do_slow_thing()
        string_outputs = [future.result() for future in these_futures]
        for tmp in string_outputs:
            print(tmp)
    

    With at least four CPU cores, you'll see this printed after roughly one-second...

    $ time python stackoverflow.py
    1-SECOND-DELAY do
    1-SECOND-DELAY foo
    1-SECOND-DELAY moo
    1-SECOND-DELAY chew
    
    real    0m1.058s
    user    0m0.060s
    sys     0m0.017s
    $
    

    Original Answer

    At this point, the only major use-case for multiprocessing is to facilitate your producers and consumers talking to each other during execution. Most people don't need that. However, if you want communication via queue / pipes, you can find my original answer to the OP's question below (which profiles how fast they are).

    The existing comments on this answer refer to the aforementioned answer below