I have a binary (say a.out
) that I want to call with different configs. I want to run these configs on a 40-core machine in parallel. Below is a sketch of my code.
It is very straightforward: I generate a config and pass in into the worker, and the worker calls the binary with the config using subprocess. I am also redirecting the output to a file. Let's call this piece of code run.py
def worker(cmdlist, filename):
outputfile = open(filename, 'wb')
// here it essentially executes a.out config > outputfile
subprocess.call(cmdlist, stderr=outputfile, stdout=outputfile)
outputfile.close()
def main():
pool = Pool(processes = 40)
for config in all_configs
filename, cmdlist = genCmd(config)
res = pool.apply_async(worker, [cmdlist, filename])
results.append(res)
for res in results:
res.get()
pool.close()
But after I kick it off, I realized that I am not spawning as many processes as I want. I definitely submitted more than 40 workers, but in top, I am only seeing about 20 of a.out.
I do see many of the run.py that are in "sleeping" state (i.e., "S" in top). When I do a ps auf
, I also saw a lot of run.py in "S+" state, with no binary spawned out. Only about half of them spawned "a.out"
I am wondering, why is this happening? I am redirecting the output to a network-mounted hard-drive, which could be a reason, but in top I only see 10%wa (which in my understanding is 10% of the time waiting for IO). I don't think this results in 50% of idle CPUs. Plus, I should at least have the binary spawned out, instead of being stuck at run.py
. My binary's runtime is also long enough. I should really be seeing 40 jobs running for a long time.
Any other explanation? Anything I did wrong in my python code?
An approach I have used to make use of many simultaneous processes running at once on multiple cores is to use p = subprocess.Popen(...) and p.Poll(). In your case I think you would be able to skip using Pool altogether. I'd give you a better example but unfortunately I don't have access to that code anymore.