I have a list of class instances, and I want to call the same instance method in parallel, use pathos to be able to pickle instance method, The true problem is when I want to change/add an attribute to the instances, it doesn't work, I think this is because the pickling to sub-process is a deep-copy of the inputs. Anyone has any idea how to solve this? I don't want to change the way of writing the instance method ( such as return a value and put it together later).
from joblib import Parallel, delayed
import pathos.multiprocessing as mp
# import multiprocessing as mp
import random
import os
pool = mp.Pool(mp.cpu_count())
class Person(object):
def __init__(self, name):
self.name = name
def print_name(self, num):
self.num = num
print "worker {}, person name {}, received int {}".format(os.getpid(), self.name, self.num)
people = [Person('a'),
Person('b'),
Person('c'),
Person('d'),
Person('e'),
Person('f'),
Person('g'),
Person('h')]
for i, per in enumerate(people):
pool.apply_async(Person.print_name, (per, i) )
pool.close()
pool.join()
print 'their number'
for per in people:
print per.num
This is the output, the num attribute is not found, I think it is because the change is made on those copies.
In [1]: run delme.py
worker 13981, person name a, random int 0
worker 13982, person name b, random int 1
worker 13983, person name c, random int 2
worker 13984, person name d, random int 3
worker 13985, person name e, random int 4
worker 13986, person name f, random int 5
worker 13987, person name g, random int 6
worker 13988, person name h, random int 7
their number
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/chimerahomes/wenhoujx/brain_project/network_analysis/delme.py in <module>()
39 print 'their number'
40 for per in people:
---> 41 print per.num
AttributeError: 'Person' object has no attribute 'num'
following suggest in the comments, I try to return self from the child-process, but it seems a pathos bug that the returned self is NOT its original type. See the following code:
import pickle
# from joblib import Parallel, delayed
import pathos.multiprocessing as mp
# import multiprocessing as mp
import random
import os
pool = mp.Pool(mp.cpu_count())
class Person(object):
def __init__(self, name):
self.name = name
def print_name(self, num):
self.num = num
print "worker {}, person name {}, received int {}".format(os.getpid(), self.name, self.num)
# return itself and put everything together
return self
people = [Person('a'),
Person('b'),
Person('c'),
Person('d'),
Person('e'),
Person('f'),
Person('g'),
Person('h')]
# Parallel(n_jobs=-1)(delayed(Person.print_name)(per) for per in people)
res = []
for i, per in enumerate(people):
res.append(pool.apply_async(Person.print_name, (per, i) ))
pool.close()
pool.join()
people = [rr.get() for rr in res]
print 'their number'
for per in people:
print per.num
print isinstance(people[0], Person)
and this is the output:
In [1]: run delme.py
worker 29963, person name a, received int 0
worker 29962, person name b, received int 1
worker 29964, person name c, received int 2
worker 29962, person name d, received int 3
worker 29966, person name e, received int 4
worker 29967, person name f, received int 5
worker 29966, person name g, received int 6
worker 29967, person name h, received int 7
their number
0
1
2
3
4
5
6
7
False
I use the default multiprocessing package, and it has no such problem.
The problem is that self.num
is a assigned in the child process. multiprocessing does not pass the original object back to the caller. It does pass the method's return code back. So, you could pass num
back directly or even self
(but that is generally inefficient and doesn't replace the existing object in the parent, just creates a new one).