Search code examples
pythonclassmultiprocessingdillpathos

Python multiprocess change class instance in place


I have a list of class instances, and I want to call the same instance method in parallel, use pathos to be able to pickle instance method, The true problem is when I want to change/add an attribute to the instances, it doesn't work, I think this is because the pickling to sub-process is a deep-copy of the inputs. Anyone has any idea how to solve this? I don't want to change the way of writing the instance method ( such as return a value and put it together later).

from joblib import Parallel, delayed
import pathos.multiprocessing as mp
# import multiprocessing as mp 
import random
import os

pool = mp.Pool(mp.cpu_count())

class Person(object):
    def __init__(self, name):
        self.name = name

    def print_name(self, num):
        self.num = num
        print "worker {}, person name {}, received int {}".format(os.getpid(), self.name, self.num)


people = [Person('a'),
          Person('b'),
          Person('c'),
          Person('d'),
          Person('e'),
          Person('f'),
          Person('g'),
          Person('h')]


for i, per in enumerate(people):
    pool.apply_async(Person.print_name, (per, i) )

pool.close()
pool.join()
print 'their number'
for per in people:
    print per.num

This is the output, the num attribute is not found, I think it is because the change is made on those copies.

In [1]: run delme.py
worker 13981, person name a, random int 0
worker 13982, person name b, random int 1
worker 13983, person name c, random int 2
worker 13984, person name d, random int 3
worker 13985, person name e, random int 4
worker 13986, person name f, random int 5
worker 13987, person name g, random int 6
worker 13988, person name h, random int 7
their number
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/chimerahomes/wenhoujx/brain_project/network_analysis/delme.py in <module>()
     39 print 'their number'
     40 for per in people:
---> 41     print per.num

AttributeError: 'Person' object has no attribute 'num'

following suggest in the comments, I try to return self from the child-process, but it seems a pathos bug that the returned self is NOT its original type. See the following code:

import pickle
# from joblib import Parallel, delayed
import pathos.multiprocessing as mp
# import multiprocessing as mp 
import random
import os

pool = mp.Pool(mp.cpu_count())

class Person(object):
    def __init__(self, name):
        self.name = name

    def print_name(self, num):
        self.num = num
        print "worker {}, person name {}, received int {}".format(os.getpid(), self.name, self.num)
        # return itself and put everything together
        return self



people = [Person('a'),
          Person('b'),
          Person('c'),
          Person('d'),
          Person('e'),
          Person('f'),
          Person('g'),
          Person('h')]

# Parallel(n_jobs=-1)(delayed(Person.print_name)(per) for per in people)

res = []
for i, per in enumerate(people):
    res.append(pool.apply_async(Person.print_name, (per, i) ))

pool.close()
pool.join()
people = [rr.get() for rr in res]


print 'their number'
for per in people:
    print per.num

print isinstance(people[0], Person)

and this is the output:

In [1]: run delme.py
worker 29963, person name a, received int 0
worker 29962, person name b, received int 1
worker 29964, person name c, received int 2
worker 29962, person name d, received int 3
worker 29966, person name e, received int 4
worker 29967, person name f, received int 5
worker 29966, person name g, received int 6
worker 29967, person name h, received int 7
their number
0
1
2
3
4
5
6
7
False

I use the default multiprocessing package, and it has no such problem.


Solution

  • The problem is that self.num is a assigned in the child process. multiprocessing does not pass the original object back to the caller. It does pass the method's return code back. So, you could pass num back directly or even self (but that is generally inefficient and doesn't replace the existing object in the parent, just creates a new one).