What are the fundamental differences between queues and pipes in Python's multiprocessing package?
In what scenarios should one choose one over the other? When is it advantageous to use Pipe()
? When is it advantageous to use Queue()
?
What are the fundamental differences between queues and pipes in Python's
multiprocessing
package?
As of modern python versions if you don't need your producers and consumers to communicate, that's the only real use-case for python multiprocessing
.
If you only need python concurrency, use concurrent.futures
.
This example uses concurrent.futures
to make four calls to do_something_slow()
, which has a one-second delay. If your machine has at least four cores, running this four-second-aggregate series of function calls only takes one-second.
By default, concurrent.futures
spawns workers corresponding to the number of CPU cores you have.
import concurrent.futures
import time
def do_slow_thing(input_str: str) -> str:
"""Return modified input string after a 1-second delay"""
if isinstance(input_str, str):
time.sleep(1)
return "1-SECOND-DELAY " + input_str
else:
return "INPUT ERROR"
if __name__=="__main__":
# Define some inputs for process pool
all_inputs = [
"do",
"foo",
"moo",
"chew",
]
# Spawn a process pool with the default number of workers...
with concurrent.futures.ProcessPoolExecutor(max_workers=None) as executor:
# For each string in all_inputs, call do_slow_thing()
# in parallel across the process worker pool
these_futures = [executor.submit(do_slow_thing, ii) for ii in all_inputs]
# Wait for all processes to finish
concurrent.futures.wait(these_futures)
# Get the results from the process pool execution... each
# future.result() call is the return value from do_slow_thing()
string_outputs = [future.result() for future in these_futures]
for tmp in string_outputs:
print(tmp)
With at least four CPU cores, you'll see this printed after roughly one-second...
$ time python stackoverflow.py
1-SECOND-DELAY do
1-SECOND-DELAY foo
1-SECOND-DELAY moo
1-SECOND-DELAY chew
real 0m1.058s
user 0m0.060s
sys 0m0.017s
$
At this point, the only major use-case for multiprocessing
is to facilitate your producers and consumers talking to each other during execution. Most people don't need that. However, if you want communication via queue / pipes, you can find my original answer to the OP's question below (which profiles how fast they are).
The existing comments on this answer refer to the aforementioned answer below