Search code examples
pythonpython-multithreadingmultiprocess

How to parallelize a process in a subclass in python


I am working on a project, where I have a main function, a class called simulator and a class called vehicle. In the main function I call: simulator.run() which calls vehicle.run() for all vehicles in simulator.vehicle_list[].

The vehicles have to calculate a trajectory, which is time-consuming process. These processes are independent of each other and I want the vehicle.run() calculations to run on multiple CPU cores.

As starter I have created a simple project (with different class-names, than above, but intuitive enough I hope) but I cannot get it to work. My code doesn't call self.calculate() in the subclass. When I remove the if __name__ == '__main__': statement from the master class, I get the error:

'multi_subclass' object has no attribute '_closed'

The three python files I have:

multi_main.py

from multi_class import multi_class

main_class = multi_class()
main_class.calculate()

#main_class.calculate2()

multi_class.py

from multi_subclass import multi_subclass

class multi_class():
    def __init__(self):
        print("I am multi_class")
        subclass_list = []
        for i in range(10):
            subclass_list += [multi_subclass()]
        self.subclass_list = subclass_list
        
    def calculate(self):
        if __name__ == '__main__':
            processes = []
            for i in range(10):
                processes.append(self.subclass_list[i])
                self.subclass_list[i].start()
                
            [proc.join() for proc in processes]

multi_subclass.py

from multiprocessing import Process

class multi_subclass(Process):
    def __init__(self):
        print("I am multi_subclass")
        
    def calculate(self):
        print("Running massive calculations")
        self.member_variable = "Finished with calculation"
        
    def calculate2(self):
        print("I also want to be paralellized but NOT together with calculate!")
        
    def run(self):
        self.calculate()

Solution

  • Ok, so this statement if __name__ == '__main__': is put in place so that if the main process spawns up new processes, those new processes won't spawn new processes as well.

    So, your entry point should have that.

    multi_main.py

    from multi_class import multi_class
    from timeit import default_timer
    
    
    start = default_timer()
    
    if __name__ == '__main__':
    
        main_class = multi_class()
        main_class.calculate()
        print(f"Your code ran in {default_timer() - start} seconds instead of 10 seconds because of multiprocessing.LOL")
    

    multi_class.py <-- Class responsible for spinning up new processes. There were a couple of issues in the calculate() function.

    from multi_subclass import multi_subclass
    
    class multi_class():
        def __init__(self):
            print("I am multi_class")
            subclass_list = []
            for i in range(10):
                subclass_list += [multi_subclass()]
            self.subclass_list = subclass_list
    
        def calculate(self):
            processes = []
    
            for i in range(10):
                # you were putting a different process in the list and never starting it but you were waiting for it to complete using `.join()` and starting a different process but never waiting for it to complete
                # fixed code below
                # processes.append(self.subclass_list[i])
                # self.subclass_list[i].start() 
                
    
                # create a process, put it in the list, start it
                process = self.subclass_list[i]
                processes.append(process)
                process.start()
    
            # wait for it to complete here
            [proc.join() for proc in processes]
    

    multi_subclass.py < -- Most important class and I liked the idea of you inheriting the Process class, so now your class is also a Process class. Cool

    But the problem was, when you create a Process() using multiprocessing module, you provide a target to run when the process starts like this -> Process(target=abc) abc is a function.

    But in your code, the __init__ doesn't call the __init__ of Process class and the object returned is a simple object without a target. I have added the code.

    from multiprocessing import Process
    import time
    class multi_subclass(Process):
        def __init__(self):
            print("I am multi_subclass")
            super().__init__(target=self.calculate)
    
        def calculate(self):
            print("Running massive calculations")
            time.sleep(1)
            self.member_variable = "Finished with calculation"
    
    
        def calculate2(self):
            print("I also want to be paralellized but NOT together with calculate!")
    
        def run(self):
            self.calculate()
    

    OUTPUT

    I am multi_class
    I am multi_subclass
    I am multi_subclass
    I am multi_subclass
    I am multi_subclass
    I am multi_subclass
    I am multi_subclass
    I am multi_subclass
    I am multi_subclass
    I am multi_subclass
    I am multi_subclass
    Running massive calculations
    Running massive calculations
    Running massive calculations
    Running massive calculations
    Running massive calculations
    Running massive calculations
    Running massive calculations
    Running massive calculations
    Running massive calculations
    Running massive calculations
    Your code ran in 1.0859881550000001 seconds instead of 10 seconds because of multiprocessing.LOL