Search code examples
slurmsbatch

sbatch sends compute node to 'drained' status


On newly installed and configured compute nodes in our small cluster I am unable to submit slurm jobs using a batch script and the 'sbatch' command. After submitting, the requested node changes to the 'drained' status. However, I can run the same command interactively using 'srun'.

Works:
srun -p debug --ntasks=1 --nodes=1 --job-name=test --nodelist=node6 -l echo 'test'

Does not work:
sbatch test.slurm
with test.slurm:

#!/bin/sh
#SBATCH --job-name=test
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --nodelist=node6
#SBATCH --partition=debug

echo 'test'

It gives me:

PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug         up    1:00:00      1  drain node6

and I have to resume the node.

All nodes run Debian 9.8, use Infiniband and NIS. I have made sure that all nodes have the same config, version of packages and daemons running. So, I don't see what I am missing.


Solution

  • Seems like the issue was connected to the present NIS. Just needed to add to the end of /etc/passwd this line:

    +::::::
    

    and restart slurmd on the node:

    /etc/init.d/slurmd restart