Search code examples
pythonpython-3.xmultiprocessingfunctools

Script using multiprocessing with partial and map failing on Python > 3, working fine on 2.7, cannot pickle '_thread.lock'


I used the following code until today on Python 2.7 to parallelize the creation of many PNG pictures with matplotlib. Today I tried to move everything on Python 3.8 and the part that I cannot adapt involves the parallelizatio done with multiprocessing.

The idea is that I have a script which needs to produce several images with similar settings from different timesteps of a data file. As the plotting routine can be parametrized I'm executing it over chunks of 10 timesteps distributed among different tasks to speed up the process.

Here is the relevant part of the script which I'm not going to paste given its length.

from multiprocessing import Pool
from functools import partial

def main():
    # arguments to be passed to the plotting functions
    # contain data and information about the plot
    args = dict(m=m, x=x, y=y, ax=ax,
                 winds_10m=winds_10m, mslp=mslp, ....)

    # chunks of timesteps 
    dates = chunks(time, 10)
    # partial version of the function plot_files(), see underneath 
    plot_files_param = partial(plot_files, **args)
    p = Pool(8)
    p.map(plot_files_param, dates)

def plot_files(dates, **args):
    first = True
    for date in dates:
        #loop over dates, retrieve data from args, e.g. args['mslp'] and do the plotting 

if __name__ == "__main__":
    import time
    start_time = time.time()
    main()
    elapsed_time=time.time()-start_time
    print_message("script took " + time.strftime("%H:%M:%S", time.gmtime(elapsed_time)))
        

This used to work fine on Python 2.7 but now I get this error

Traceback (most recent call last):
  File "plot_winds10m.py", line 135, in <module>
    main()
  File "plot_winds10m.py", line 79, in main
    p.map(plot_files_param, dates)
  File "lib/python3.8/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
  File "lib/python3.8/multiprocessing/pool.py", line 537, in _handle_tasks
    put(task)
  File "lib/python3.8/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
TypeError: cannot pickle '_thread.lock' object

the only thing that changed, besides the Python version and the packages versions, is the system. I'm testing this on MacOS instead than Linux, but it should not make a big difference especially since this is all running inside a conda environment.

Does anyone have an idea on how to fix this?

(here is the link to the github repo https://github.com/guidocioni/icon_forecasts/blob/master/plotting/plot_winds10m.py )


Solution

  • I figured out the problem in case anyone arrives here desperate for an answer.

    The problem is that some of the conversion that I was doing using metpy.unit_array produce a pint array which for some reason is not pickable. When I was then passing this array in the args of the partial function I was getting the error.

    Trying instead to do the conversion with .convert_units() or just extracting the array part from the data (either with .values or .magnitude) ensured that I was passing only a numpy array or a DataArray and these object are pickable.