Search code examples
pythonpicklepython-multithreadingpyparsingpython-multiprocessing

Can't pickle Pyparsing expression with setParseAction() method. Needed for multiprocessing


My original issue is that I am trying to do the following:

def submit_decoder_process(decoder, input_line):
    decoder.process_line(input_line)
    return decoder

self.pool = Pool(processes=num_of_processes)
self.pool.apply_async(submit_decoder_process, [decoder, input_line]).get()

decoder is a bit involved to describe here, but the important thing is that decoder is an object that is initialized with PyParsing expression that calls setParseAction(). This fails pickle that multiprocessing uses and this in turn fails the above code.

Now, here is the pickle/PyParsing problem that I have isolated and simplified. The following code yields an error message due to pickle failure.

import pickle
from pyparsing import *

def my_pa_func():
    pass

pickle.dumps(Word(nums).setParseAction(my_pa_func))

Error message:

pickle.PicklingError: Can't pickle <function wrapper at 0x00000000026534A8>: it's not found as pyparsing.wrapper

Now If you remove the call .setParseAction(my_pa_func), it will work with no problems:

pickle.dumps(Word(nums))

How can I get around it? Multiprocesing uses pickle, so I can't avoid it, I guess. The pathos package that is supposedly uses dill is not mature enough, at least, I am having problems installing it on my Windows-64bit. I am really scratching my head here.


Solution

  • OK, here is the solution inspired by rocksportrocker: Python multiprocessing pickling error

    The idea is to dill the object that can't be pickled while passing it back and forth between processes and then "undill" it after it has been passed:

    from multiprocessing import Pool
    import dill
    
    def submit_decoder_process(decoder_dill, input_line):
        decoder = dill.loads(decoder_dill)  # undill after it was passed to a pool process
        decoder.process_line(input_line)
        return dill.dumps(decoder)  # dill before passing back to parent process
    
    self.pool = Pool(processes=num_of_processes)
    
    # Dill before sending to a pool process
    decoder_processed = dill.loads(self.pool.apply_async(submit_decoder_process, [dill.dumps(decoder), input_line]).get())