Search code examples
objectmultiprocessingpicklelegacy

in a pickle: how to serialise legacy objects for submission to a Python multiprocessing pool


I have written a nice parallel job processor that accepts jobs (functions, their arguments, timeout information etc.) and submits then to a Python multiprocessing pool. I can provide the full (long) code if requested, but the key step (as I see it) is the asynchronous application to the pool:

job.resultGetter = self.pool.apply_async(
    func = job.workFunction,
    kwds = job.workFunctionKeywordArguments
)

I am trying to use this parallel job processor with a large body of legacy code and, perhaps naturally, have run into pickling problems:

PicklingError: Can’t pickle <type ’instancemethod’>: attribute lookup builtin .instancemethod failed

This type of problem is observable when I try to submit a problematic object as an argument for a work function. The real problem is that this is legacy code and I am advised that I can make only very minor changes to it. So... is there some clever trick or simple modification I can make somewhere that could allow my parallel job processor code to cope with these traditionally unpicklable objects? I have total control over the parallel job processor code, so I am open to, say, wrapping every submitted function in another function. For the legacy code, I should be able to add the occasional small method to objects, but that's about it. Is there some clever approach to this type of problem?


Solution

  • use dill and pathos.multiprocessing instead of pickle and multiprocessing.

    see here: What can multiprocessing and dill do together?

    http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/

    How to pickle functions/classes defined in __main__ (python)