python python-3.x class multiprocessing fork

Pass arguments through self in class instance while multiprocessing in Python

It seems to work, but is it safe to use self after forking? Or should I always pass arguments to the subprocess as function parameters through args?

import multiprocessing as mp

class C():

    def __init__(self):
        self.v = 'bla'
        p = mp.Process(target=self.worker, args=[])
        #p = mp.Process(target=self.worker, args=(self.v,))
        p.start()
        p.join()

    def worker(self):
        print(self.v)

    #def worker(self, v):
        #print(v)

c = C()

# prints 'bla'

To be more specific, I want to pass manager.Queue() objects, not sure, if it makes a difference.

If this was a simple C fork(), since the whole process is copied identically - except for the pid -, self would be the same. But Python multiprocessing may be doing something I am not aware of, or there may be a warning somewhere like "don't use it like this, this may change in the future". I did not find anything addressing specifically this question.

My actual worries is that arguments passed in args, especially if they are associated with the multiprocessing module may be transformed around fork() to avoid whatever problems.

Python 3.6.5

Solution

For anything other than the fork start method, both the target and the arguments are sent to the worker processes using pickling, when Process.start() is called. For the fork method, the child process is forked at the same point, so when Process.start() is called.

So when you don't use the fork start method, what you need to worry about is if your data can be pickled. When that is the case then there is no reason to avoid using a class instance and self; the whole instance is pickled as self.target is a method that includes a reference to the instance:

>>> class C:
...     def __init__(self):
...         self.v = 'bla'
...     def worker(self):
...         print(self.v)
...
>>> c = C()
>>> data = pickle.dumps(c.worker)
>>> pickletools.dis(data)
    0: \x80 PROTO      4
    2: \x95 FRAME      71
   11: \x8c SHORT_BINUNICODE 'builtins'
   21: \x94 MEMOIZE    (as 0)
   22: \x8c SHORT_BINUNICODE 'getattr'
   31: \x94 MEMOIZE    (as 1)
   32: \x93 STACK_GLOBAL
   33: \x94 MEMOIZE    (as 2)
   34: \x8c SHORT_BINUNICODE '__main__'
   44: \x94 MEMOIZE    (as 3)
   45: \x8c SHORT_BINUNICODE 'C'
   48: \x94 MEMOIZE    (as 4)
   49: \x93 STACK_GLOBAL
   50: \x94 MEMOIZE    (as 5)
   51: )    EMPTY_TUPLE
   52: \x81 NEWOBJ
   53: \x94 MEMOIZE    (as 6)
   54: }    EMPTY_DICT
   55: \x94 MEMOIZE    (as 7)
   56: \x8c SHORT_BINUNICODE 'v'
   59: \x94 MEMOIZE    (as 8)
   60: \x8c SHORT_BINUNICODE 'bla'
   65: \x94 MEMOIZE    (as 9)
   66: s    SETITEM
   67: b    BUILD
   68: \x8c SHORT_BINUNICODE 'worker'
   76: \x94 MEMOIZE    (as 10)
   77: \x86 TUPLE2
   78: \x94 MEMOIZE    (as 11)
   79: R    REDUCE
   80: \x94 MEMOIZE    (as 12)
   81: .    STOP
highest protocol among opcodes = 4

In the above stream you can clearly see v, 'blah' and worker named.

If you do use the fork start method, then the child process simply has full access to everything that was in memory in the parent process; self is still referencing the same object you had before forking. Your OS takes care of the details there, such as ensuring that file descriptors are independent, and that the child process gets a copy of the memory blocks that are being altered.

Either way, further changes you make to the instance won't be visible to the parent process, unless you explicitly use data structures designed to be shared.