Search code examples
pythonmultiprocessingpickledillpathos

Python Multiprocessing: AttributeError: 'Test' object has no attribute 'get_type'


short short version:

I am having trouble parallelizing code which uses instance methods.

Longer version:

This python code produces the error:

Error
Traceback (most recent call last):
  File "/Users/gilzellner/dev/git/3.2.1-build/cloudify-system-tests/cosmo_tester/test_suites/stress_test_openstack/test_file.py", line 24, in test
self.pool.map(self.f, [self, url])
File "/Users/gilzellner/.virtualenvs/3.2.1-build/lib/python2.7/site-packages/pathos/multiprocessing.py", line 131, in map
return _pool.map(star(f), zip(*args)) # chunksize
File "/Users/gilzellner/.virtualenvs/3.2.1-build/lib/python2.7/site-packages/multiprocess/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/Users/gilzellner/.virtualenvs/3.2.1-build/lib/python2.7/site-packages/multiprocess/pool.py", line 567, in get
raise self._value
AttributeError: 'Test' object has no attribute 'get_type'

This is a simplified version of a real problem I have.

import urllib2
from time import sleep
from os import getpid
import unittest
from pathos.multiprocessing import ProcessingPool as Pool

class Test(unittest.TestCase):

    def f(self, x):
        print urllib2.urlopen(x).read()
        print getpid()
        return

    def g(self, y, z):
        print y
        print z
        return

    def test(self):
        url = "http://nba.com"
        self.pool = Pool(processes=1)
        for x in range(0, 3):
            self.pool.map(self.f, [self, url])
            self.pool.map(self.g, [self, url, 1])
        sleep(10)

I am using pathos.multiprocessing due to the recommendation here: Multiprocessing: Pool and pickle Error -- Pickling Error: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed

Before using pathos.multiprocessing, the error was:

"PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed"

Solution

  • You're using multiprocessing map method incorrectly.
    According to python docs:

    A parallel equivalent of the map() built-in function (it supports only one iterable argument though).

    Where standard map:

    Apply function to every item of iterable and return a list of the results.

    Example usage:

    from multiprocessing import Pool
    
    def f(x):
        return x*x
    
    if __name__ == '__main__':
        p = Pool(5)
        print(p.map(f, [1, 2, 3]))
    

    What you're looking for is apply_async method:

    def test(self):
        url = "http://nba.com"
        self.pool = Pool(processes=1)
        for x in range(0, 3):
            self.pool.apply_async(self.f, args=(self, url))
            self.pool.apply_async(self.g, args=(self, url, 1))
        sleep(10)