I was experimenting different starting methods in multiprocessing module and I found something weird. Changing the variable method
from "spawn"
to "fork"
, drops the execution time from 9.5
to just 0.5
.
import multiprocessing as mp
from multiprocessing import Process, Value
from time import time
def increment_value(shared_integer):
with shared_integer.get_lock():
shared_integer.value += 1
if __name__ == "__main__":
method = "spawn"
mp.set_start_method(method)
start = time()
for _ in range(200):
integer = Value("i", 0)
procs = [
Process(target=increment_value, args=(integer,)),
Process(target=increment_value, args=(integer,)),
]
for p in procs:
p.start()
for p in procs:
p.join()
assert integer.value == 2
print(f"{method} - Finished in {time() - start:.4f} seconds")
outputs for different runs:
spawn - Finished in 9.4275 seconds
fork - Finished in 0.5316 seconds
I'm aware of how these two methods start a new child process(well-explained here), but this difference puts a big question mark in my head. I would like know exactly which part of the code impacts the performance mostly? Is that the pickling part in "spawn"
? Does it have anything to do with the lock?
I'm running this code on Linux Pop!_OS and my interpreter version is 3.11.
The fork method copies all resources from a process and continues from that point, whereas the spawn method creates a new instance of the Python interpreter and recreates that point (by "point" I mean the state of the process: what resources it has, etc).
As you can imagine, simply copying some data has a lot less overhead than creating an entirely new Python process and recreating all of that data, and thus it is much quicker.