python python-3.x multiprocessing fork python-multiprocessing

why spawn method is much slower than fork method in python multiprocessing

I was experimenting different starting methods in multiprocessing module and I found something weird. Changing the variable method from "spawn" to "fork", drops the execution time from 9.5 to just 0.5.

import multiprocessing as mp
from multiprocessing import Process, Value
from time import time


def increment_value(shared_integer):
    with shared_integer.get_lock():
        shared_integer.value += 1


if __name__ == "__main__":
    method = "spawn"
    mp.set_start_method(method)

    start = time()
    for _ in range(200):
        integer = Value("i", 0)
        procs = [
            Process(target=increment_value, args=(integer,)),
            Process(target=increment_value, args=(integer,)),
        ]

        for p in procs:
            p.start()
        for p in procs:
            p.join()

        assert integer.value == 2

    print(f"{method} - Finished in {time() - start:.4f} seconds")

outputs for different runs:

spawn - Finished in 9.4275 seconds
fork - Finished in 0.5316 seconds

I'm aware of how these two methods start a new child process(well-explained here), but this difference puts a big question mark in my head. I would like know exactly which part of the code impacts the performance mostly? Is that the pickling part in "spawn"? Does it have anything to do with the lock?

I'm running this code on Linux Pop!_OS and my interpreter version is 3.11.

Solution

The fork method copies all resources from a process and continues from that point, whereas the spawn method creates a new instance of the Python interpreter and recreates that point (by "point" I mean the state of the process: what resources it has, etc).

As you can imagine, simply copying some data has a lot less overhead than creating an entirely new Python process and recreating all of that data, and thus it is much quicker.