Search code examples
slurmjob-control

Slurm job arrays: is there a way to create a job array on slurm that starts at different times?


I have a long-running task that I want to run using a job array on slurm.

The script I am currently using to submit the jobs is:

#!/bin/bash

#SBATCH --output=slurm-%A_%a.out

#SBATCH --array=1-30

#SBATCH --ntasks=1

#SBATCH --qos=qos-15d

#SBATCH --partition=large

#SBATCH --mem=4G

srun ./a

This script works fine, but my problem is as it is an array of 30 jobs, I need to start the first one at time X and then start the second after X minutes and so on. I want to do this because I will simulate a C-compiled program that uses the srand (time (0)) function to generate random numbers. Therefore, the above script produces the same results for 30 simulations, because the random number generated will be equal. As each simulation takes a long time to run, it is not feasible for me to wait for a job to complete before starting another job.


Solution

  • Assuming you are the only one using the cluster (otherwise, startup times will not be the same for all your jobs), one little trick is to add a random sleep at the beginning of your script:

    #!/bin/bash
    
    #SBATCH --output=slurm-%A_%a.out
    #SBATCH --array=1-30
    #SBATCH --ntasks=1
    #SBATCH --qos=qos-15d
    #SBATCH --partition=large
    #SBATCH --mem=4G
    
    sleep $((RANDOM%30+1))
    
    srun ./a
    

    Even if all jobs start at the same time, they will spend some random time (from 1 to 30 seconds) sleeping before actually starting the computations.