Search code examples
pythonlockingmultiprocessingjoblib

Counting parallel function calls python


I have a problem where I need to call an instance function of a class in parallel and count the number of times it has been called so each call has a unique identifier (to be used to store results in a unique location).

Here is a question with solutions for what I want but in Java

Here is a minimal example:

para2.py, which sets up all the instance-method pickling stuff (less relevant):

from copy_reg import pickle
from types import MethodType
from para import func

def _pickle_method(method):
    return _unpickle_method, (method.im_func.__name__, method.im_self, method.im_class)

def _unpickle_method(func_name, obj, cls):
    return cls.__dict__[func_name].__get__(obj, cls)

pickle(MethodType, _pickle_method, _unpickle_method)

func()

And now para.py contains:

from sklearn.externals.joblib import Parallel, delayed
from math import sqrt
from multiprocessing import Lock

class Thing(object):

    COUNT = 0
    lock = Lock()

    def objFn(self, x):
        with Thing.lock:
            mecount = Thing.COUNT
            Thing.COUNT += 1

        print mecount

        n=0
        while n < 10000000:# add a little delay for consistency
            n += 1
        return sqrt(x)

def func()
    thing = Thing()

    y = Parallel(n_jobs=4)(delayed(thing.objFn)(i**2) for i in range(10))
    print y

Now running python para2.py in a terminal prints

0
0
0
0
1
1
1
1
2
2
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

I need those numbers on the vertical to count 0 to 9, but it appears that all four processes are still accessing and trying to update COUNT concurrently. How can I make this do what I want?


Solution

  • With multiprocessing, python forks your code and creates a child process where it runs the code. In doing this it creates a copy of the class for each child process. It doesn't share the code/data. You can debug this a bit by placing print comments such as...

    print multiprocessing.current_process().name

    in your constructor and in your objFn to see what's running where and what it's value is.

    In order to share data between processes you have to something designed for this from the multiprocessing library. These are the Value and Array objects. These use shared memory and because of that are generally limited to integral ctypes, not just any generic python object.