I have a problem where I need to call an instance function of a class in parallel and count the number of times it has been called so each call has a unique identifier (to be used to store results in a unique location).
Here is a question with solutions for what I want but in Java
Here is a minimal example:
para2.py, which sets up all the instance-method pickling stuff (less relevant):
from copy_reg import pickle
from types import MethodType
from para import func
def _pickle_method(method):
return _unpickle_method, (method.im_func.__name__, method.im_self, method.im_class)
def _unpickle_method(func_name, obj, cls):
return cls.__dict__[func_name].__get__(obj, cls)
pickle(MethodType, _pickle_method, _unpickle_method)
func()
And now para.py contains:
from sklearn.externals.joblib import Parallel, delayed
from math import sqrt
from multiprocessing import Lock
class Thing(object):
COUNT = 0
lock = Lock()
def objFn(self, x):
with Thing.lock:
mecount = Thing.COUNT
Thing.COUNT += 1
print mecount
n=0
while n < 10000000:# add a little delay for consistency
n += 1
return sqrt(x)
def func()
thing = Thing()
y = Parallel(n_jobs=4)(delayed(thing.objFn)(i**2) for i in range(10))
print y
Now running python para2.py
in a terminal prints
0
0
0
0
1
1
1
1
2
2
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
I need those numbers on the vertical to count 0 to 9, but it appears that all four processes are still accessing and trying to update COUNT
concurrently. How can I make this do what I want?
With multiprocessing, python forks your code and creates a child process where it runs the code. In doing this it creates a copy of the class for each child process. It doesn't share the code/data. You can debug this a bit by placing print comments such as...
print multiprocessing.current_process().name
in your constructor and in your objFn
to see what's running where and what it's value is.
In order to share data between processes you have to something designed for this from the multiprocessing
library. These are the Value and Array
objects. These use shared memory and because of that are generally limited to integral ctypes
, not just any generic python object.