Search code examples
pythonmultiprocessingpickle

PicklingError : What can cause this error in a function?


I have a function with a list of objects, two list of int and an int (an ID) as parameters, which returns a tuple of two list of int. this function works very well but when my list of ID grows, it takes a lot of time. Having already used multiprocessing in other projects, it seemed to me that the situation was appropriate for the use of multiprocessing Pool.

However, I get an error _pickle.PicklingError when launching it.

I have spent the past days looking for alternatives ways of doing this : I discovered pathos ProcessPool that runs forever with no indication of the problem. I have tried ThreadingPool as an accepted answer sugested, but it is obviously not adapted to my issue since it does not use multiple CPUs and doesnt speed up the process.

Here is a sample of my function, it is not a reproductible example since it is specific to my case. But I believe the function is pretty clear : It returns a tuple of two lists, created in a for loop.

def getNormalOnConnectedElements(elem, mapping, idList, node):
    normalZ = []
    eids = []
    for e in mapping[node]:
        if e in idList:
            normalZ.append(elem[e].Normal()[2])
            eids.append(e)
    return normalZ, eids

I tried calling it as I usually do :

with Pool(4) as p:
    # with functools.partial()
    result = p.map(partial(getNormalOnConnectedElements, elemList, mapping, idList), nodeList)
    # or with itertools.repeat()
    result = p.starmap(getNormalOnConnectedElements, zip(repeat(elemList), repeat(mapping), repeat(idList), nodeList))

I made sure the function is defined at the top-level, and the call is within a if __name__ == "__main__": block.

So the question is : What in this function causes pickle to throw _pickle.PicklingError ?

Edit :

  File "<input>", line 1, in <module>
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.2.2\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.2.2\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/TLEP6OQM/Documents/Anaconda/PLoad tool/model.py", line 209, in <module>
    allVec = p.map(partial(getNormalOnConnectedElements, elem, allElemIds, mapping), myFilter)
  File "C:\ProgramData\Anaconda3\envs\myenv\lib\multiprocessing\pool.py", line 290, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\ProgramData\Anaconda3\envs\myenv\lib\multiprocessing\pool.py", line 683, in get
    raise self._value
  File "C:\ProgramData\Anaconda3\envs\myenv\lib\multiprocessing\pool.py", line 457, in _handle_tasks
    put(task)
  File "C:\ProgramData\Anaconda3\envs\myenv\lib\multiprocessing\connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "C:\ProgramData\Anaconda3\envs\myenv\lib\multiprocessing\reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function getNormalOnConnectedElements at 0x00000257E6785620>: attribute lookup getNormalOnConnectedElements on __main__ failed

Solution

  • If anyone stumble upon this question, the reason this error happened even with a very simplist function is because of the way I was running the python script. As it is well explained in the comments by ShadowRanger, the function needs to be defined at the top level. Within PyCharm, "Run File in Python Console" does not simply run it, but puts a wrapper around.

    By running the file the proper way, or calling python myscript.py, theres no raised error.