Search code examples
pythonmultiprocessingpickleconcurrent.futures

ProcessPoolExecutor: TypeError: cannot pickle 'PyCapsule' object


I'm facing an issue with concurrent.futures's ProcessPoolExecutor, which I'm trying to use in of my classes:

    def __init__():
         self._pool = ProcessPoolExecutor()

    def handle_event():
         ...
         filepath: Path = xxxx
         future = self._pool.submit(self.test, filepath)
         res = future.result()
         self.logger.debug(res)

    def test(self, filepath: Path):
        print("Test ProcessPoolExecutor")
        return 3

This little code has a strange behavior. First, when I remove getting the result with future.result(), I see no output of my print message in the test() function.

Then, when I explicitely ask for the result of the future, I'm getting a TypeError:

Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/usr/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
TypeError: cannot pickle 'PyCapsule' object
  1. What I don't understand here is that the types I'm sending (Path) and receiving (int) are both Picklable.

  2. Secondly, I don't even know what this PyCapsule dependency is, since it doesn't appear in my requirements.txt, nor in pip freeze (related SO post)

.nox/run/bin/pip freeze | grep -E '(capsule|dill)'

Any idea why I cannot see my print statements appearing ? What about the PyCapsule errors ? Is it an issue with my types, or somewhere else in the application ?

Thanks !


Solution

  • When scheduling a job to concurrent.futures.ProcessPoolExecutor you are shipping both the function name and the parameters. The child process receives the function names and looks it up within its memory.

    As you are passing an object method, things get more complicated. The child process cannot find such method among its functions. Therefore, the parent has to pickle and ship the whole object.

    This is where you encounter the problem:

    cls(buf, protocol).dump(obj)
    

    The pickle protocol does not know how to serialize your object as it contains un-picklable components. In particular, PyCapsule is an internal Python data structure.

    It is recommended against passing object methods to process pools as it's hard to predict whether an object will be picklable or not. Moreover, you pay the increased cost of serializing an entire object and transferring it through a pipe instead of just shipping the function name.

    For more information regarding what is easy to pickle and what is not you can refer to its module documentation.

    If you cannot comply with the above recommendation, you can have a look at other pickle implementations such as dill.