My code calls a number of processes running a different python script using subprocess.Popen functions and waits for all of them to finish running. Sometimes (Not all of the runs, even with identical params) one or more of the processes are not running (the first thing in the called code is the creation of a log file which is not created in those cases).
for fname1 in input_files_1:
cmd_list = create_batch_cmd(fname1)
print (' '.join(cmd_list))
p_list.append(Popen(' '.join(cmd_list), shell=True))
logging.info("started process for %s" % fname1)
logging.info("waiting for processes")
ps_status = []
while True:
ps_status = [p.poll() for p in p_list]
if all([x is not None for x in ps_status]):
break
logging.info("all processes finished")
print ps_status
ps_status usually is a list of zeros. In a run of 12 processes when 3 failed, ps_status was:
[0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0]
What could cause this? and how can I investigate? Thanks
your processes number 3,8,10 fail , it is not a python problem since all others seems to be working fine. and maybe you should do p.wait() instead of poll()