I have a function with a list of objects, two list of int and an int (an ID) as parameters, which returns a tuple of two list of int. this function works very well but when my list of ID grows, it takes a lot of time. Having already used multiprocessing in other projects, it seemed to me that the situation was appropriate for the use of multiprocessing Pool.
However, I get an error _pickle.PicklingError
when launching it.
I have spent the past days looking for alternatives ways of doing this : I discovered pathos
ProcessPool
that runs forever with no indication of the problem. I have tried ThreadingPool
as an accepted answer sugested, but it is obviously not adapted to my issue since it does not use multiple CPUs and doesnt speed up the process.
Here is a sample of my function, it is not a reproductible example since it is specific to my case. But I believe the function is pretty clear : It returns a tuple of two lists, created in a for loop.
def getNormalOnConnectedElements(elem, mapping, idList, node):
normalZ = []
eids = []
for e in mapping[node]:
if e in idList:
normalZ.append(elem[e].Normal()[2])
eids.append(e)
return normalZ, eids
I tried calling it as I usually do :
with Pool(4) as p:
# with functools.partial()
result = p.map(partial(getNormalOnConnectedElements, elemList, mapping, idList), nodeList)
# or with itertools.repeat()
result = p.starmap(getNormalOnConnectedElements, zip(repeat(elemList), repeat(mapping), repeat(idList), nodeList))
I made sure the function is defined at the top-level, and the call is within a if __name__ == "__main__":
block.
So the question is : What in this function causes pickle to throw _pickle.PicklingError
?
Edit :
File "<input>", line 1, in <module>
File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.2.2\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.2.2\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/TLEP6OQM/Documents/Anaconda/PLoad tool/model.py", line 209, in <module>
allVec = p.map(partial(getNormalOnConnectedElements, elem, allElemIds, mapping), myFilter)
File "C:\ProgramData\Anaconda3\envs\myenv\lib\multiprocessing\pool.py", line 290, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\ProgramData\Anaconda3\envs\myenv\lib\multiprocessing\pool.py", line 683, in get
raise self._value
File "C:\ProgramData\Anaconda3\envs\myenv\lib\multiprocessing\pool.py", line 457, in _handle_tasks
put(task)
File "C:\ProgramData\Anaconda3\envs\myenv\lib\multiprocessing\connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "C:\ProgramData\Anaconda3\envs\myenv\lib\multiprocessing\reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function getNormalOnConnectedElements at 0x00000257E6785620>: attribute lookup getNormalOnConnectedElements on __main__ failed
If anyone stumble upon this question, the reason this error happened even with a very simplist function is because of the way I was running the python script. As it is well explained in the comments by ShadowRanger, the function needs to be defined at the top level. Within PyCharm, "Run File in Python Console" does not simply run it, but puts a wrapper around.
By running the file the proper way, or calling python myscript.py
, theres no raised error.