Search code examples
pythonmultiprocessingforkpython-multiprocessingspawn

multiprocessing fork() vs spawn()


I was reading the description of the two from the python doc:

spawn

The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process objects run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver. [Available on Unix and Windows. The default on Windows and macOS.]

fork

The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic. [Available on Unix only. The default on Unix.]

And my question is:

  1. is it that the fork is much quicker 'cuz it does not try to identify which resources to copy?
  2. is it that, since fork duplicates everything, it would "waste" much more resources comparing to spawn()?

Solution

    1. is it that the fork is much quicker 'cuz it does not try to identify which resources to copy?

    Yes, it's much quicker. The kernel can clone the whole process and only copies modified memory-pages as a whole. Piping resources to a new process and booting the interpreter from scratch is not necessary.

    1. is it that, since fork duplicates everything, it would "waste" much more resources comparing to spawn()?

    Fork on modern kernels does only "copy-on-write" and it only affects memory-pages which actually change. The caveat is that "write" already encompasses merely iterating over an object in CPython. That's because the reference-count for the object gets incremented.

    If you have long running processes with lots of small objects in use, this can mean you waste more memory than with spawn. Anecdotally I recall Facebook claiming to have memory-usage reduced considerably with switching from "fork" to "spawn" for their Python-processes.