Search code examples
python-2.7multiprocessingpoolpython-multiprocessing

Pooling a member function in python 2.7


I am getting a weird error with this code. I am trying to pool instances of a worker function which is a member the class invoking the pool. While I had my doubts if this will work or not, I am not sure of the exact reason as to why? The error thrown when I run this is a "PicklingError". Can someone explain why?

import multiprocessing
import time


class Pooler(multiprocessing.Process):
  def __init__(self):
    multiprocessing.Process.__init__(self)

  def run(self):
    pool = multiprocessing.Pool(10)
    print "starting pool"
    pool.map(self.worker, xrange(10), chunksize=10)

  def worker(self, arg):
    print "worker - arg - {}".format(arg)


if __name__ == '__main__':
  jobs = []
  for i in range(5):
    proc = Pooler()
    jobs.append(proc)
    proc.start()

  for j in jobs:
    j.join()

  print "...ending"

UPDATE

I changed the code to look as follows:

import multiprocessing
import time


class Pooler(multiprocessing.Process):
  def __init__(self):
    multiprocessing.Process.__init__(self)

  def run(self):
    pool = multiprocessing.Pool(1)
    print "starting pool"
    obj = Worker()
    pool.map(obj.run, range(10), chunksize=1)

class Worker(object):
  def __init__(self):
    pass

  def run(self, nums):
    print "worker - arg - {}".format(nums)


if __name__ == '__main__':
  jobs = []
  for i in range(1):
    proc = Pooler()
    jobs.append(proc)
    proc.start()

  for j in jobs:
    j.join()

  print "...ending"

but I am still getting the following error:

starting pool
Process Pooler-1:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "pool_test.py", line 13, in run
    pool.map(obj.run, range(10), chunksize=1)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 567, in get
    raise self._value
PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed
...ending

Solution

  • The answer is simple. multiprocessing uses pickle to serialize objects and pass those objects between the different processes -- and as the error states, pickle can't serialize an instancemethod. You need to use a better serializer, like dill if you want to serialize an instancemethod (see https://stackoverflow.com/a/21345273/2379433).

    So what can you do about multiprocessing? Fortunately, there is a fork of multiprocessing called multiprocess that uses dill, and if you use it, your object will serialize and your code will work. It's a one-line change that comes with the bonus of being able to run from the interpreter as well as serialize almost all the objects in python. (The link I posted above is for pathos and dill, but pathos is built on top of multiprocess, so it's still very relevant.)

    >>> import multiprocess as multiprocessing
    >>> import time
    >>> class Pooler(multiprocessing.Process):
    ...   def __init__(self):
    ...     multiprocessing.Process.__init__(self)
    ...   def run(self):
    ...     pool = multiprocessing.Pool(1)
    ...     print "starting pool"
    ...     obj = Worker()
    ...     pool.map(obj.run, range(10), chunksize=1)
    ... 
    >>> class Worker(object):
    ...   def __init__(self):
    ...     pass
    ...   def run(self, nums):
    ...     print "worker - arg - {}".format(nums)
    ... 
    >>> if __name__ == '__main__':
    ...   jobs = []
    ...   for i in range(1):
    ...     proc = Pooler()
    ...     jobs.append(proc)
    ...     proc.start()
    ...   for j in jobs:
    ...     j.join()
    ...   print "...ending"
    ... 
    starting pool
    worker - arg - 0
    worker - arg - 1
    worker - arg - 2
    worker - arg - 3
    worker - arg - 4
    worker - arg - 5
    worker - arg - 6
    worker - arg - 7
    worker - arg - 8
    worker - arg - 9
    ...ending
    >>>