I have 50 processes I want to run in parallel. I need to run the processes on a gpu. My machine has 8 gpus, I pass the device number to each process so it knows what device to run on. Once that processes is done I want to run another process on that device. The processes are run as subprocesses using POpen with the command below
python special_process.py device
A simple way to do this would be
for group in groups:
processes = [subprocess.POpen(f'python special_process.py {device}'.split()) for device in range(8)]
[p.wait() for p in process]
where groups
, are the 50 processes split into groups of 8.
The downside of this is some processes take longer than others and all processes need to finish before it moves to the next group.
I was hoping to do something like multiprocess.spawn
, but I need the last process to return the device number so it is clear which device is open to run on. I tried using Queue
and Process
from multiprocessing but I can't get more than 1 process to run at once.
Any help would be very appreciated. Thanks
Simple while loop and building your own queue worked. Just don't use wait until the end.
import subprocess
d = list(range(20))
num_gpus = 8
procs = []
gpus_free = set([j for j in range(num_gpus)])
gpus_used = set()
while len(d) > 0:
for proc, gpu in procs:
poll = proc.poll()
if poll is None:
# Proc still running
continue
else:
# Proc complete - pop from list
procs.remove((proc, gpu))
gpus_free.add(gpu)
# Submit new processes
if len(procs) < num_gpus:
this_process = d.pop()
gpu_for_this_process = gpus_free.pop()
command = f"python3 inner_function.py {gpu_for_this_process} {this_process}"
proc = subprocess.Popen(command, shell= True)
procs.append((proc, gpu_for_this_process))
[proc.wait() for proc, _ in procs]
print('DONE with all')