Search code examples
pythonpython-multiprocessingtorch

Passing objects in python multiprocess.spawn


I have 50 processes I want to run in parallel. I need to run the processes on a gpu. My machine has 8 gpus, I pass the device number to each process so it knows what device to run on. Once that processes is done I want to run another process on that device. The processes are run as subprocesses using POpen with the command below

python special_process.py device

A simple way to do this would be

for group in groups:
    processes = [subprocess.POpen(f'python special_process.py {device}'.split()) for device in range(8)]
    [p.wait() for p in process]

where groups, are the 50 processes split into groups of 8.

The downside of this is some processes take longer than others and all processes need to finish before it moves to the next group.

I was hoping to do something like multiprocess.spawn, but I need the last process to return the device number so it is clear which device is open to run on. I tried using Queue and Process from multiprocessing but I can't get more than 1 process to run at once.

Any help would be very appreciated. Thanks


Solution

  • Simple while loop and building your own queue worked. Just don't use wait until the end.

    import subprocess
    
    d = list(range(20))
    num_gpus = 8
    procs = []
    gpus_free = set([j for j in range(num_gpus)])
    gpus_used = set()
    
    while len(d) > 0:
        for proc, gpu in procs:
            poll = proc.poll()
            if poll is None:
                # Proc still running
                continue
            else:
                # Proc complete - pop from list
                procs.remove((proc, gpu))
                gpus_free.add(gpu)
    
        # Submit new processes
        if len(procs) < num_gpus:
            this_process = d.pop()
            gpu_for_this_process = gpus_free.pop()
            command = f"python3 inner_function.py {gpu_for_this_process} {this_process}"
            proc = subprocess.Popen(command, shell= True)
            procs.append((proc, gpu_for_this_process))
    
    [proc.wait() for proc, _ in procs]
    print('DONE with all')