Search code examples
pythonpython-multiprocessing

How do python main process and forked process share gc information?


From the post https://stackoverflow.com/a/53504673/9191338 :

spawn objects are cloned and each is finalised in each process

fork/forkserver objects are shared and finalised in the main process

This seems to be the case:

import os
from multiprocessing import Process
import multiprocessing
multiprocessing.set_start_method('fork', force=True)

class Track:
    def __init__(self):
        print(f'{os.getpid()=} object created in {__name__=}')
    
    def __getstate__(self):
        print(f'{os.getpid()=} object pickled in {__name__=}')
        return {}
    
    def __setstate__(self, state):
        print(f'{os.getpid()=} object unpickled in {__name__=}')
        return self

    def __del__(self):
        print(f'{os.getpid()=} object deleted in {__name__=}')

def f(x):
    print(f'{os.getpid()=} function executed in {__name__=}')

if __name__ == '__main__':

    x = Track()

    for i in range(2):
        print(f'{os.getpid()=} Iteration: {i}, Process object created')
        p = Process(target=f, args=(x,))
        print(f'{os.getpid()=} Iteration: {i}, Process created and started')
        p.start()
        print(f'{os.getpid()=} Iteration: {i}, Process starts to run functions')
        p.join()

The output is:

os.getpid()=30620 object created in __name__='__main__'
os.getpid()=30620 Iteration: 0, Process object created
os.getpid()=30620 Iteration: 0, Process created and started
os.getpid()=30620 Iteration: 0, Process starts to run functions
os.getpid()=30623 function executed in __name__='__main__'
os.getpid()=30620 Iteration: 1, Process object created
os.getpid()=30620 Iteration: 1, Process created and started
os.getpid()=30620 Iteration: 1, Process starts to run functions
os.getpid()=30624 function executed in __name__='__main__'
os.getpid()=30620 object deleted in __name__='__main__'

Indeed the object is only deleted in the main process.

My question is, how is this achieved? Although the new process is forked from the main process, after fork, the new process is essentially another process, how can these two processes share gc information?

In addition, does the gc information sharing happen for every object, or just the object passed as argument for subprocess?


Solution

  • on linux using fork start_method when creating child processes, python uses os._exit() (note the underscore) to terminate the child process once the function ends .... this is somewhat equivalent to the process crashing, therefore no destructors have any chance of being called, the process just terminates, and the OS just reclaims whatever resources were allocated to this process.

    so you shouldn't rely on destructors being called by child processes to release resources. (for example to shut down an external server)