python linux parallel-processing multiprocessing nfs

Python multiprocessing + subprocess issues

I have a binary (say a.out) that I want to call with different configs. I want to run these configs on a 40-core machine in parallel. Below is a sketch of my code.

It is very straightforward: I generate a config and pass in into the worker, and the worker calls the binary with the config using subprocess. I am also redirecting the output to a file. Let's call this piece of code run.py

def worker(cmdlist, filename):
    outputfile = open(filename, 'wb')
    // here it essentially executes  a.out config > outputfile
    subprocess.call(cmdlist, stderr=outputfile, stdout=outputfile) 
    outputfile.close()

def main():
    pool = Pool(processes = 40)
    for config in all_configs
        filename, cmdlist = genCmd(config)
        res = pool.apply_async(worker, [cmdlist, filename])
        results.append(res)
    for res in results:
        res.get()
    pool.close()

But after I kick it off, I realized that I am not spawning as many processes as I want. I definitely submitted more than 40 workers, but in top, I am only seeing about 20 of a.out.

I do see many of the run.py that are in "sleeping" state (i.e., "S" in top). When I do a ps auf, I also saw a lot of run.py in "S+" state, with no binary spawned out. Only about half of them spawned "a.out"

I am wondering, why is this happening? I am redirecting the output to a network-mounted hard-drive, which could be a reason, but in top I only see 10%wa (which in my understanding is 10% of the time waiting for IO). I don't think this results in 50% of idle CPUs. Plus, I should at least have the binary spawned out, instead of being stuck at run.py. My binary's runtime is also long enough. I should really be seeing 40 jobs running for a long time.

Any other explanation? Anything I did wrong in my python code?

Solution

An approach I have used to make use of many simultaneous processes running at once on multiple cores is to use p = subprocess.Popen(...) and p.Poll(). In your case I think you would be able to skip using Pool altogether. I'd give you a better example but unfortunately I don't have access to that code anymore.