Search code examples
pythonparallel-processingfunctional-programmingmultiprocessingfunctools

Python multiprocessing map function error


I have a simple multiprocessing example that I'm trying to create. The ordinary map() function version works, but when changed to Pool.map, I'm getting a strange error:

from multiprocessing import Pool
from functools import partial
x = [1,2,3]
y = 10
f = lambda x,y: x**2+y

# ordinary map works:
map(partial(f,y=y),x)
# [11, 14, 19]

# multiprocessing map does not
p = Pool(4)
p.map(partial(f, y=y), x)
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks
    put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

Pickling error? What is this exactly?


Solution

  • The arguments to Pool.map must be picklable. Module-level functions are picklable, but partial(f, y=y) is not defined at the module-level and so is not pickable.

    There is a simple workaround:

    def g(x, y=y):
        return f(x, y)
    
    p.map(g, x)
    

    Functions made with functools.partial used to be unpickable. However, with Python2.7 or better, you can also define g (at the module level) using functools.partial:

    import multiprocessing as mp
    import functools
    
    def f(x, y):
        return x**2 + y
    
    x = [1,2,3]
    y = 10
    
    g = functools.partial(f, y=y)
    
    if __name__ == '__main__':
        p = mp.Pool()
        print(p.map(g, x))
    

    yields [11, 14, 19]. But note to get this result f had to be defined with def rather than lambda. I think this is because pickle relies on "fully qualified" name references to look up function object values.