Search code examples
multiprocessingattributeerrorpython-3.8

AttributeError: Can't pickle local object in Multiprocessing


I am very new to python and I encounter this error. CODE 1 :

import multiprocessing as mp
import os
 
def calc(num1, num2):
    global addi
    def addi(num1, num2):
        print(num1+num2)
    m = mp.Process(target = addi, args = (num1, num2))
    m.start()

    print("here is main", os.getpid())
    m.join()
  
if __name__ == "__main__":
    # creating processes
   calc(5, 6)

ERROR 1 :

ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'calc.<locals>.addi'

After reading around a little I understand that pickle cannot be used for local methods and so I also tried the below solution which gave another error.

CODE 2 :

import multiprocessing as mp
import os
   
def calc(num1, num2):
    global addi
    def addi(num1, num2):
        print(num1+num2)
    m = mp.Process(target = addi, args = (num1, num2))
    m.start()

    print("here is main", os.getpid())
    m.join()
  
if __name__ == "__main__":
    # creating processes
   calc(5, 6)

ERROR 2 :

self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'addi' on <module '__mp_main__' from '/Users

Could someone please help me out with this? I am clueless on what to do next! The python version I am using is python3.8.9

Thank you so much!


Solution

  • Basically, the reason you are getting this error is because multiprocessing uses pickle, which can only serialize top-module level functions in general. Function addi is not a top-module level function. In fact, the line global addi is not doing anything because addi has never been declared in the outer module. So you have three ways to fix this.

    Method 1

    You can define addi in the global scope before executing calc function:

    import multiprocessing as mp
    import os
    
    
    def addi(num1, num2):
        print(num1 + num2)
    
    def calc(num1, num2):
    
        m = mp.Process(target=addi, args=(num1, num2))
        m.start()
    
        print("here is main", os.getpid())
        m.join()
    
    
    if __name__ == "__main__":
        # creating processes
        calc(5, 6)
    

    Output

    here is main 9924
    11
    

    Method 2

    You can switch to multiprocess, which uses dill instead of pickle, and can serialize such functions.

    import multiprocess as mp  # Note that we are importing "multiprocess", no "ing"!
    import os
    
    def calc(num1, num2):
    
        def addi(num1, num2):
            print(num1 + num2)
    
        m = mp.Process(target=addi, args=(num1, num2))
        m.start()
    
        print("here is main", os.getpid())
        m.join()
    
    
    if __name__ == "__main__":
        # creating processes
        calc(5, 6)
    

    Output

    here is main 67632
    11
    

    Method 2b

    While it's a useful library, there are a few valid reasons why you may not want to use multiprocess. A big one is the fact that the standard library's multiprocessing and this fork are not compatible with each other (especially if you use anything from within the subpackage multiprocessing.managers). This means that if you are using this fork in your own project, but also use third-party libraries which themselves use the standard library's multiprocesing instead, you may see unexpected behaviour.

    Anyway, in cases where you want to stick with the standard library's multiprocessing and not use the fork, you can use dill yourself to serialize python closures like the function addi by subclassing the Process class and adding some of our own logic. An example is given below:

    import dill
    from multiprocessing import Process  # Use the standard library only
    import os
    
    class DillProcess(Process):
    
        def __init__(self, *args, **kwargs):
            super().__init__(*args, **kwargs)
            self._target = dill.dumps(self._target)  # Save the target function as bytes, using dill
    
        def run(self):
            if self._target:
                self._target = dill.loads(self._target)    # Unpickle the target function before executing
                self._target(*self._args, **self._kwargs)  # Execute the target function
    
    
    def calc(num1, num2):
    
        def addi(num1, num2):
            print(num1 + num2)
    
        m = DillProcess(target=addi, args=(num1, num2))  # Note how we use DillProcess, and not multiprocessing.Process
        m.start()
    
        print("here is main", os.getpid())
        m.join()
    
    
    if __name__ == "__main__":
        # creating processes
        calc(5, 6)
    

    Output

    here is main 23360
    11
    

    Method 3

    This method is for those who cannot use any third-party libraries in their code. I will recommend making sure that the above methods did not work before resorting to this one because it's a little hacky and you do need to restructure some of your code.

    Anyways, this method works by referencing your local functions in the top-module scope, so that they become accessible by pickle. To do this dynamically, we create a placeholder class and add all the local functions as its class attributes. We would also need to make sure that the functions' __qualname__ attribute is altered to point to their new location, and that this all is done every run outside the if __name__ ... block (otherwise newly started processes won't see the attributes). Consider a slightly modified version of your code here:

    import multiprocessing as mp
    import os
    
    def calc(num1, num2):
    
        def addi(num1, num2):
            print(num1 + num2)
    
        # Another local function you might have
        def addi2():
            print('hahahaha')
    
        m = mp.Process(target=addi, args=(num1, num2))
        m.start()
    
        print("here is main", os.getpid())
        m.join()
    
    
    if __name__ == "__main__":
        # creating processes
        calc(5, 6)
    

    Below is a how you can make it work by using the above detailed method:

    import multiprocessing as mp
    import os
    
    
    # This is our placeholder class, all local functions will be added as it's attributes
    class _LocalFunctions:
        @classmethod
        def add_functions(cls, *args):
            for function in args:
                setattr(cls, function.__name__, function)
                function.__qualname__ = cls.__qualname__ + '.' + function.__name__
    
    
    def calc(num1, num2, _init=False):
        # The _init parameter is to initialize all local functions outside __main__ block without actually running the 
        # whole function. Basically, you shift all local function definitions to the top and add them to our 
        # _LocalFunctions class. Now, if the _init parameter is True, then this means that the function call was just to 
        # initialize the local functions and you SHOULD NOT do anything else. This means that after they are initialized,
        # you simply return (check below)
    
        def addi(num1, num2):
            print(num1 + num2)
    
        # Another local function you might have
        def addi2():
            print('hahahaha')
    
        # Add all functions to _LocalFunctions class, separating each with a comma:
        _LocalFunctions.add_functions(addi, addi2)
    
        # IMPORTANT: return and don't actually execute the logic of the function if _init is True!
        if _init is True:
            return
    
        # Beyond here is where you put the function's actual logic including any assertions, etc.
        m = mp.Process(target=addi, args=(num1, num2))
        m.start()
    
        print("here is main", os.getpid())
        m.join()
    
    
    # All factory functions must be initialized BEFORE the "if __name__ ..." clause. If they require any parameters,
    # substitute with bogus ones and make sure to put the _init parameter value as True!
    calc(0, 0, _init=True)
    
    if __name__ == '__main__':
        a = calc(5, 6)
    

    So there are a few things you would need to change in your code, namely that all local functions inside are defined at the top and all factory functions need to be initialized (for which they need to accept the _init parameter) outside the if __name__ ... clause. But this is probably the best you can do if you can't use dill.