Search code examples
bashcluster-computingslurmsbatch

How to hold jobs for a user so that a total cpu number isn't reached in slurm?


I am submitting a bunch of array jobs 4 sets of 5 with 8 cpus each, so 4x5x8=160 cpus total. I would like to keep the running amount to below 100>2x5x8+1x2x8 cpus total, because I need to let others run things. My whole research group is allowed 300 cpus, but I want to stay below 100 so I am not overstepping my share. How do I stop jobs in the queue from automatically running if they will exceed my 100 total cpu (self imposed) max once they run?

My submit script is below and is run 4 times with different input parameters:

#!/bin/bash

#SBATCH --time=13-00:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=6000MB
#SBATCH --array=1-5

echo -e "Pwmm\tPwmw\tPmwm\tPmww\n$1\t$2\t$3\t$4" > "params_$SLURM_ARRAY_JOB_ID.txt";

mkdir Games-Surfing-Pwvw_1.0-Pwvv_-1.0-Pmvm_1.0-Pmvv_-1.0-Pvwv_0.9-Pvww_-0.9-Pvmv_0.9-Pvmm_-0.9-Pwmw_$2-Pwmm_$1-Pmwm_$3-Pmww_$4-T_20000000-K_100-M_200-Zone_175; 

/home/jmg367/JULIA/julia-1.8.0/bin/julia -t $SLURM_NTASKS $PWD/surf_probs_re_gill_games.jl $SLURM_NNODES $SLURM_NTASKS $SLURM_ARRAY_TASK_ID $SLURM_ARRAY_JOB_ID;

Solution

  • One possibility for you would be to submit two of the job arrays with a dependency upon the other two (--dependency=afterany:...), so that only two job arrays are running at a time. That makes at most 258=80 CPUs at a time.

    job_array_id1=$(sbatch submit.sh A B C D)
    job_array_id2=$(sbatch submit.sh E F G H)
    sbatch --depdendency=afterany:$job_array_id1 submit.sh I J K L
    sbatch --depdendency=afterany:$job_array_id2 submit.sh M N O P