Search code examples
pythonmultiprocessingpython-multiprocessingprocess-pool

class attributes and memory shared between Processes in process pool?


I have a class A that when initiated changes a mutable class attribute nums.

when initiating the class via a Process pool with maxtasksperchild= 1, I notice that nums has the values of several different processes. which is an undesirable behavior for me.

my questions are:

  • are the processes sharing memory ?
  • am i not understanding maxtasksperchild and the workings of a Process pool correctly ?

EDIT: I am guessing that that the pool pickles the previous processes it started (and not the original one) and thus saving the values of nums, is that correct? and if so, how can i force it to use the original process?

here is an example code:

from multiprocessing import Pool


class A:
    nums = []

    def __init__(self, num=None):
        self.__class__.nums.append(num)  # I use 'self.__class__' for the sake of explicitly
        print(self.__class__.nums)
        assert len(self.__class__.nums) < 2  # checking that they don't share memory


if __name__ == '__main__':
    with Pool(maxtasksperchild=1) as pool:
        pool.map(A, range(99))  # the assert is being raised

EDIT because of answer by k.wahome: using instance attributes doesn't answer my question I need to use class attributes because in my original code (not shown here) i have several instances per process. my question is specifically about the workings of a multiprocessing pool.


btw, doing the following does work

from multiprocessing import Process

if __name__ == '__main__':
    prs = []
    for i in range(99):
        pr = Process(target=A, args=[i])
        pr.start()
        prs.append(pr)
    [pr.join() for pr in prs]
# the assert was not raised

Solution

  • Your observation has another reason. The values in nums are not from other processes but from the same process when it starts hosting multiple instances of A. This happens because you didn't set chunksize to 1 in your pool.map-call. Setting maxtasksperchild=1 is not enough in your case because one task still consumes a whole chunk of the iterable.

    This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. The (approximate) size of these chunks can be specified by setting chunksize to a positive integer. docs about map