Search code examples
pythonpython-multiprocessing

Python multiprocess with class instance


I have a question that is not really related to a problem I have but rather to why it is not a problem. Perhaps is a bit dumb, but I am not super familiar with classes and I'm trying to learn. Let's say I have a class defined as follows:

import numpy as np
import multiprocessing as mp


class Foo(object):
    def __init__(self, a):
        self.a = a

    def Sum(self, b):
        self.a = np.random.randint(10)
        return self.a + b, self.a

and I create an object:

foo = Foo(1)

then I want to compute the result of Sum for different values of b, in parallel between different processes:

def Calc(b):
    return foo.Sum(b)

pool = mp.Pool(processes=2)
b = [0, 1, 2, 3]
out = pool.map(Calc, b)
print(out)

which prints (in one case as it is random):

[(8, 8), (5, 4), (3, 1), (7, 4)]

which is correct. My question is how can the different processes modify a class attribute, a in our case, at the same time (in this example the operation is quite quick, but in my real world example the operation takes several seconds if not minutes, hence the parallelization) without affecting each other?


Solution

  • Each process is self contained and there is no communication between them. When you send the foo object to different processes they are no longer the same thing - there are many of them doing there own thing. Your question isn't really about classes or class instances but about what happens in different processes.

    Printing the id of the instance along with its a attribute can illustrate.

    import multiprocessing as mp
    import numpy as np
    
    class Foo(object):
        def __init__(self, a):
            self.a = a
        def Sum(self, b):
            s = f'I am {id(self)}, a before={self.a}'
            self.a = np.random.randint(10)
            print(f'{s} | a after={self.a}')
            return self.a + b, self.a
    
    foo = Foo(1)
    
    def Calc(b):
        return foo.Sum(b)
    
    if __name__ == '__main__':
    
        print(f'original foo id:{id(foo)}')
    
        pool = mp.Pool(processes=2)
        b = [0, 1, 2, 3, 5, 6, 7, 8]
        out = pool.map(Calc, b)
        print(out)
        print(f'{id(foo)}.a is still {foo.a}') 
        # not sure why this is necessary
        pool.terminate()
    

    Then running from a command prompt:

    PS C:\pyprojects> py -m tmp
    original foo id:2235026702928
    I am 1850261105632, a before=1 | a after=4
    I am 1905926138848, a before=1 | a after=1
    I am 1850261105632, a before=4 | a after=8
    I am 1905926138848, a before=1 | a after=9
    I am 1850261105632, a before=8 | a after=2
    I am 1905926138848, a before=9 | a after=9
    I am 1850261105632, a before=2 | a after=7
    I am 1905926138848, a before=9 | a after=3
    [(4, 4), (2, 1), (10, 8), (12, 9), (7, 2), (15, 9), (14, 7), (11, 3)]
    2235026702928.a is still 1
    

    Playing with print strings:

    import multiprocessing as mp
    import numpy as np
    import os
    
    class Foo(object):
        def __init__(self, a):
            self.a = a
        def Sum(self, b):
            s = f'I am {id(self)}, a: before={self.a}'
            self.a = np.random.randint(10)
            s = f'{s} | after={self.a}'
            return os.getpid(),s,(self.a + b, self.a),b
    
    foo = Foo(1)
    
    def Calc(b):
        return foo.Sum(b)
    
    if __name__ == '__main__':
    
        print(f'original foo id:{id(foo)}')
    
        pool = mp.Pool(processes=2)
        b = [0, 1, 2, 3, 5, 6, 7, 8]
        out = pool.map(Calc, b)
        out.sort(key=lambda x: (x[0],x[-1]))
        for result in out:
            print(f'pid:{result[0]} b:{result[-1]} {result[1]} {result[2]}')
        print(f'{id(foo)}.a is still {foo.a}')
        pool.terminate()
    

    ...

    PS C:\pyprojects> py -m tmp
    original foo id:2466513417648
    pid:10460 b:1 I am 2729330535728, a: before=1 | after=2 (3, 2)
    pid:10460 b:3 I am 2729330535728, a: before=2 | after=5 (8, 5)
    pid:10460 b:6 I am 2729330535728, a: before=5 | after=2 (8, 2)
    pid:10460 b:8 I am 2729330535728, a: before=2 | after=2 (10, 2)
    pid:13100 b:0 I am 2799588470064, a: before=1 | after=1 (1, 1)
    pid:13100 b:2 I am 2799588470064, a: before=1 | after=6 (8, 6)
    pid:13100 b:5 I am 2799588470064, a: before=6 | after=8 (13, 8)
    pid:13100 b:7 I am 2799588470064, a: before=8 | after=0 (7, 0)
    2466513417648.a is still 1
    PS C:\pyprojects>