Search code examples
mpisequentialslurm

Run one sequential task after big MPI job in SLURM


I have a slurm job which I launch using batch script, say:

#! /bin/bash -l

#SBATCH --job-name=job1
#SBATCH -o stdout.log
#SBATCH -e stderr.log
#SBATCH --ntasks=160

cd $WORK/job1

mpirun ./mympitask # 1.)

./collect_results  # 2.) long-running sequential task.

The first step (1.) runs in parallel using MPI, however, the second step (2.) I need to do just needs one task and the rest of the tasks should be released so that I don't occupy them or spend useless CPU-time.

Is it possible to for example:

a) release all, except one tasks, and run the final step on one CPU?

b) specify a command that should be run after the sbatch job is done?

I was thinking about using an salloc call for the last step.


Solution

  • These two options are available with SLURM

    1) Before running the sequential post processing task, you can

    scontrol update job=$SLURM_JOBID NodeList=`hostname`
    

    In order to shrink the job size to one node.

    I do not know if and how to shrink the job to one core.

    2) An other option is to submit two jobs, the post processing job being dependent on the MPI job:

    sbatch mpijob.slurm
    sbatch -d afterok:<mpijob SLURM jobid> postprocessing.slurm
    

    The non trivial (this is not rocket science though) part is to automatically retrieve the jobid of the first job.