Search code examples
pythonmultiprocessing

python muliprocessing: AttributeError: Can't get attribute <function_name> on module


The following code:

import numpy as np, pandas as pd
import multiprocessing, itertools, timeit
from functools import partial

processes = 5 * multiprocessing.cpu_count()
print(f'processes: {processes}')
pool = multiprocessing.Pool(processes=processes)

def calc(x, y):
   return x+y

def calc_all():
   pairs = [[1,1], [2,2], [3,3]]
   results = pool.map(calc, pairs)
   print(results)

if __name__ == '__main__':
   calc_all()

is returning:

  File "/usr/local/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/multiprocessing/pool.py", line 114, in worker
    task = get()
           ^^^^^
  File "/usr/local/lib/python3.12/multiprocessing/queues.py", line 389, in get
    return _ForkingPickler.loads(res)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'calc' on <module '__main__' from '/workspaces/calc.py'>

If i move just the main to a separate file and import calc_all, i still get the same error (different module name ofcourse)

I'f i move calc_all and the main to another module and import calc, it works fine.

I would like to understand why this is happening when both are top level functions. Is there a better way to solve this problem rather than having to move part of the module to a separate file?


Solution

  • I'm not able to replicate it, I get into a weird loop. When asking questions like these, you need to include what your environment is: python version, OS, etc.

    Also there's no point in spawning 5*cpu_count processes, that's just gonna create a ton of overhead.

    It works fine when I clean it up:

    from functools import partial
    import multiprocessing, itertools, timeit
    
    import numpy as np
    import pandas as pd
    
    
    def setup():
        processes =  multiprocessing.cpu_count()
        print(f'processes: {processes}')
        pool = multiprocessing.Pool(processes=processes)
        return pool
    
    def calc(args):
       return sum(args)
    
    def calc_all():
        pool = setup()
        pairs = [[1,1], [2,2], [3,3]]
        results = pool.map(calc, pairs)
        print(results)
    
    if __name__ == '__main__':
       calc_all()