Search code examples
pythonpython-multiprocessing

Start a new multiprocessing.pool() process for each one that dies


I'm using a python api for a proprietary software to run numerical simulations. I need to do quite a few so have tried to speed things up using multiprocessing.pool() to run simulations in parallel. The simulations are independent and the function passed to multiprosessing.pool() returns nothing but the simulation results are saved to disk. As far as I understand this should be similar to opening X no of terminals and running a call to the API from each.

Using multiprocessing starts off well, I can see all processors running at 100% which is expected for the simulations. However after a while the processes seem to die. Eventually I end up with no active processes but still simulations that have not started. I think that the problem is that the API is sometimes a a little buggy. Certain errors cause python kernel to crash. I think this likely what is happening with my multiprocessing.pool().

Is there a way that I can add a new process for each one that dies so that there will always be processes in the pool? For now I can run the individual simulations that give problems manually.

Below is a minimum working example but I am not sure how to reproduce an error that causes the kernel to crash so it is not of much use.

from multiprocessing import Pool
from multiprocessing import cpu_count
import time

def test_function(a,b):
    "Takes in two variables to justify starmap, pause,return nothing"
    print(f'running case {a}')
    ' api(a,b) - Runs a simulation and saves output to disk'
    'include error that "randomly" crashes python console/process'
    time.sleep(5)


if __name__ == '__main__':

    case_names = list(range(60))
    b = 'b'
    
    inputs = [(a,b) for a in case_names]  #All the inputs in order needed by run_wdi

    
    
    start_time = time.time()
    
    # no_processes = cpu_count()
    no_processes = min(cpu_count(),len(inputs))
    
    print(f"Using {no_processes} processes on {cpu_count()} cpu's")
    
    # with Pool(processes=no_processes) as pool:
    with Pool() as pool:
        result = pool.starmap(test_function, inputs)
    
    end_time = time.time()
    print(f'Total time {end_time-start_time}')

Solution

  • Thanks to everyone that commented/provided feedback. It looks like the problem is linked to a buggy API so I am going to park this for now.