The following code:
import numpy as np, pandas as pd
import multiprocessing, itertools, timeit
from functools import partial
processes = 5 * multiprocessing.cpu_count()
print(f'processes: {processes}')
pool = multiprocessing.Pool(processes=processes)
def calc(x, y):
return x+y
def calc_all():
pairs = [[1,1], [2,2], [3,3]]
results = pool.map(calc, pairs)
print(results)
if __name__ == '__main__':
calc_all()
is returning:
File "/usr/local/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.12/multiprocessing/pool.py", line 114, in worker
task = get()
^^^^^
File "/usr/local/lib/python3.12/multiprocessing/queues.py", line 389, in get
return _ForkingPickler.loads(res)
^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'calc' on <module '__main__' from '/workspaces/calc.py'>
If i move just the main
to a separate file and import calc_all
, i still get the same error (different module name ofcourse)
I'f i move calc_all
and the main
to another module and import calc
, it works fine.
I would like to understand why this is happening when both are top level functions. Is there a better way to solve this problem rather than having to move part of the module to a separate file?
I'm not able to replicate it, I get into a weird loop. When asking questions like these, you need to include what your environment is: python version, OS, etc.
Also there's no point in spawning 5*cpu_count processes, that's just gonna create a ton of overhead.
It works fine when I clean it up:
from functools import partial
import multiprocessing, itertools, timeit
import numpy as np
import pandas as pd
def setup():
processes = multiprocessing.cpu_count()
print(f'processes: {processes}')
pool = multiprocessing.Pool(processes=processes)
return pool
def calc(args):
return sum(args)
def calc_all():
pool = setup()
pairs = [[1,1], [2,2], [3,3]]
results = pool.map(calc, pairs)
print(results)
if __name__ == '__main__':
calc_all()