I have a python script I run on HPC that takes a list of files in a text file and starts multiple SBATCH runs:
./launch_job.sh 0_folder_file_list.txt
launch_job.sh goes through 0_folder_file_list.txt and starts an SBATCH for each file
SAMPLE_LIST=`cut -d "." -f 1 $1`
for SAMPLE in $SAMPLE_LIST
do
echo "Getting accessions from $SAMPLE"
sbatch get_acc.slurm $SAMPLE
#./get_job.slurm $SAMPLE
done
get_job.slurm has all of my SBATCH information, module loads, etc. and performs
srun --mpi=pmi2 -n 5 python python_script.py ${SAMPLE}.txt
I don't want to start all of the jobs at one time, I would like them to run consecutively with a 24-hour maximum run time. I have already set my SBATCH -t to allow for a maximum time but I only want each job to run for a maximum of 24-hours. Is there a srun argument I can set that will accomplish this? Something else?
You can use --wait
flag with sbatch
.
-W, --wait Do not exit until the submitted job terminates. The exit code of the sbatch command will be the same as the exit code of the submitted job. If the job terminated due to a signal rather than a normal exit, the exit code will be set to 1. In the case of a job array, the exit code recorded will be the highest value for any task in the job array.
In your case,
for SAMPLE in $SAMPLE_LIST
do
echo "Getting accessions from $SAMPLE"
sbatch --wait get_acc.slurm $SAMPLE
done
So, the next sbatch
command will only be called after the first sbatch
finishes (your job ended or time limit reached).