Search code examples
pythonpython-3.xclassmultiprocessingfork

Pass arguments through self in class instance while multiprocessing in Python


It seems to work, but is it safe to use self after forking? Or should I always pass arguments to the subprocess as function parameters through args?

import multiprocessing as mp

class C():

    def __init__(self):
        self.v = 'bla'
        p = mp.Process(target=self.worker, args=[])
        #p = mp.Process(target=self.worker, args=(self.v,))
        p.start()
        p.join()

    def worker(self):
        print(self.v)

    #def worker(self, v):
        #print(v)

c = C()

# prints 'bla'

To be more specific, I want to pass manager.Queue() objects, not sure, if it makes a difference.

If this was a simple C fork(), since the whole process is copied identically - except for the pid -, self would be the same. But Python multiprocessing may be doing something I am not aware of, or there may be a warning somewhere like "don't use it like this, this may change in the future". I did not find anything addressing specifically this question.

My actual worries is that arguments passed in args, especially if they are associated with the multiprocessing module may be transformed around fork() to avoid whatever problems.

Python 3.6.5


Solution

  • For anything other than the fork start method, both the target and the arguments are sent to the worker processes using pickling, when Process.start() is called. For the fork method, the child process is forked at the same point, so when Process.start() is called.

    So when you don't use the fork start method, what you need to worry about is if your data can be pickled. When that is the case then there is no reason to avoid using a class instance and self; the whole instance is pickled as self.target is a method that includes a reference to the instance:

    >>> class C:
    ...     def __init__(self):
    ...         self.v = 'bla'
    ...     def worker(self):
    ...         print(self.v)
    ...
    >>> c = C()
    >>> data = pickle.dumps(c.worker)
    >>> pickletools.dis(data)
        0: \x80 PROTO      4
        2: \x95 FRAME      71
       11: \x8c SHORT_BINUNICODE 'builtins'
       21: \x94 MEMOIZE    (as 0)
       22: \x8c SHORT_BINUNICODE 'getattr'
       31: \x94 MEMOIZE    (as 1)
       32: \x93 STACK_GLOBAL
       33: \x94 MEMOIZE    (as 2)
       34: \x8c SHORT_BINUNICODE '__main__'
       44: \x94 MEMOIZE    (as 3)
       45: \x8c SHORT_BINUNICODE 'C'
       48: \x94 MEMOIZE    (as 4)
       49: \x93 STACK_GLOBAL
       50: \x94 MEMOIZE    (as 5)
       51: )    EMPTY_TUPLE
       52: \x81 NEWOBJ
       53: \x94 MEMOIZE    (as 6)
       54: }    EMPTY_DICT
       55: \x94 MEMOIZE    (as 7)
       56: \x8c SHORT_BINUNICODE 'v'
       59: \x94 MEMOIZE    (as 8)
       60: \x8c SHORT_BINUNICODE 'bla'
       65: \x94 MEMOIZE    (as 9)
       66: s    SETITEM
       67: b    BUILD
       68: \x8c SHORT_BINUNICODE 'worker'
       76: \x94 MEMOIZE    (as 10)
       77: \x86 TUPLE2
       78: \x94 MEMOIZE    (as 11)
       79: R    REDUCE
       80: \x94 MEMOIZE    (as 12)
       81: .    STOP
    highest protocol among opcodes = 4
    

    In the above stream you can clearly see v, 'blah' and worker named.

    If you do use the fork start method, then the child process simply has full access to everything that was in memory in the parent process; self is still referencing the same object you had before forking. Your OS takes care of the details there, such as ensuring that file descriptors are independent, and that the child process gets a copy of the memory blocks that are being altered.

    Either way, further changes you make to the instance won't be visible to the parent process, unless you explicitly use data structures designed to be shared.