Search code examples
cluster-computingslurmsbatch

Iterative slurm job


I'm trying to optimize a study I'm doing. I currently have to job scripts I call them step1 and step2. In step1

#!/bin/bash

#SBATCH --output=slurm-%j.out
#SBATCH --nodes=16
#SBATCH --ntasks-per-node=28
#SBATCH --time=24:00:00

module load <everything I need>



echo "Start of program at `date`"

srun $HOME/project/bin/my_executable1 ../data/my_datafile0.dat

echo "End of program at `date`"

After this job is done I have a new datafile that we can call my_datafile1.dat and this goes into the the second job script step2:

#!/bin/bash

#SBATCH --output=slurm-%j.out
#SBATCH --nodes=16
#SBATCH --ntasks-per-node=28
#SBATCH --time=24:00:00

module load <everything I need>



echo "Start of program at `date`"

srun $HOME/project/bin/my_executable1 ../data/my_datafile1.dat

echo "End of program at `date`"

After this job I have a new datafile called my_datafile2.dat which I use in step1 again and then the new one in step2 etc. I'm wondering if there is a way to write a job script which does this iteration for me. I would like to tell it to do 20 iterations and then I'll end up with my_datafile1.dat, my_datafile2.dat, ..., my_datafile20.dat.


Solution

  • In a single job? If so, you could just use a loop, like follows (with a bit of verbosity in the inner loop, but can be replace by a single line if wanted).

    Edit after the clarification in the comments below: Basically one step is an execution of my_executable1 followed by my_executable2. To simplify, let's call A the output of 1 and B the output of 2:

    #!/bin/bash
    
    #SBATCH --output=slurm-%j.out
    #SBATCH --nodes=16
    #SBATCH --ntasks-per-node=28
    #SBATCH --time=24:00:00
    
    module load <everything I need>
    
    echo "Start of program at `date`"
    
    for I in $(seq 10); do
      CMD="srun $HOME/project/bin/my_executable1 ../data/my_datafile_A_${I}.dat"
      echo "Launching command \"$CMD\" at $(date)"
      eval $CMD
      CMD="srun $HOME/project/bin/my_executable2 ../data/my_datafile_B_${I}.dat"
      echo "Launching command \"$CMD\" at $(date)"
      eval $CMD
    done
    
    echo "End of program at `date`"
    

    If for some reason you really want to increment the index at each substep, you can use bc for the small computation:

    #!/bin/bash
    
    #SBATCH --output=slurm-%j.out
    #SBATCH --nodes=16
    #SBATCH --ntasks-per-node=28
    #SBATCH --time=24:00:00
    
    module load <everything I need>
    
    echo "Start of program at `date`"
    
    for I in $(seq 10); do
      INDEX=$(echo "2*$I-1" | bc)
      CMD="srun $HOME/project/bin/my_executable1 ../data/my_datafile_${INDEX}.dat"
      echo "Launching command \"$CMD\" at $(date)"
      eval $CMD
      INDEX=$(echo "2*$I" | bc)
      CMD="srun $HOME/project/bin/my_executable2 ../data/my_datafile_${INDEX}.dat"
      echo "Launching command \"$CMD\" at $(date)"
      eval $CMD
    done
    
    echo "End of program at `date`"