I am launching 4 process via job script. These 4 processes launch successfully on the slurm host. But if the last process(4th one) crashed or get killed, the whole job get terminated. That's not the case with first 3 process. If I kill any one of first 3 process, the job stays there until all remaining processes complete their execution. But if I killed the last one, it terminates and kill first three processes as well.
my job_scrpt.sh contains: python process1 & process2 & process3 & process4
how does the job termination(in both successful and failed cases) works in slurm?
With
process1 &
process2 &
process3 &
process4
the script will terminate as soon as process4
terminates (it is the only one "blocking"); and so will the job.
You should write
process1 &
process2 &
process3 &
process4 &
wait
so as to send all four processes to the background, and the wait
command will block the script until all of them terminate, and so will the job.