Search code examples
pythonslurmhpcsbatch

Run multiple files consecutively via SLURM with individual timeout


I have a python script I run on HPC that takes a list of files in a text file and starts multiple SBATCH runs:

./launch_job.sh 0_folder_file_list.txt

launch_job.sh goes through 0_folder_file_list.txt and starts an SBATCH for each file

SAMPLE_LIST=`cut -d "." -f 1 $1`

for SAMPLE in $SAMPLE_LIST
do
  echo "Getting accessions from $SAMPLE"
  sbatch get_acc.slurm $SAMPLE
  #./get_job.slurm $SAMPLE
done

get_job.slurm has all of my SBATCH information, module loads, etc. and performs

srun --mpi=pmi2 -n 5 python python_script.py ${SAMPLE}.txt

I don't want to start all of the jobs at one time, I would like them to run consecutively with a 24-hour maximum run time. I have already set my SBATCH -t to allow for a maximum time but I only want each job to run for a maximum of 24-hours. Is there a srun argument I can set that will accomplish this? Something else?


Solution

  • You can use --wait flag with sbatch.

    -W, --wait Do not exit until the submitted job terminates. The exit code of the sbatch command will be the same as the exit code of the submitted job. If the job terminated due to a signal rather than a normal exit, the exit code will be set to 1. In the case of a job array, the exit code recorded will be the highest value for any task in the job array.

    In your case,

    for SAMPLE in $SAMPLE_LIST
    do
      echo "Getting accessions from $SAMPLE"
      sbatch --wait get_acc.slurm $SAMPLE
    done
    

    So, the next sbatch command will only be called after the first sbatch finishes (your job ended or time limit reached).