Search code examples
slurmhpc

Best practices to submit a huge numer of jobs with slurm


I need to submit several thousand jobs to our cluster. Each job needs around six hours to complete. This will take around a week if I would use all available resources. Theoretically I could do that but the I would block all other users for a week. So this is not an option.

I have two ideas that could possibly solve the problem:

  • Create an array job and limit the maximum number of running jobs. I don't like this option because quite often (over night, weekends, etc.) no one uses the cluster and my jobs can not use these unused resources.
  • Submit all jobs at once but somehow set the priority of each job really low. Ideally anyone could still use the cluster because when they submit jobs they will start sooner than mine. I do not know if this is possible in slurm and if I would have the permission to do that.

Is there a slurm mechanism I am missing? Is it possible to set priorities of a slurm job as described above and would I have permission to do that?


Solution

  • Generaly this is the cluster admin problem. They should have configured the cluster in a way that prioritize short and small jobs over long and large ones and/or prevent large jobs from running on some nodes.

    However you can also manually reduce the priority of your job as a non admin with the nice factor option (higher -> less priority):

    sbatch --nice=POSITIVE_NUMBER script.sh