On newly installed and configured compute nodes in our small cluster I am unable to submit slurm jobs using a batch script and the 'sbatch' command. After submitting, the requested node changes to the 'drained' status. However, I can run the same command interactively using 'srun'.
Works:
srun -p debug --ntasks=1 --nodes=1 --job-name=test --nodelist=node6 -l echo 'test'
Does not work:
sbatch test.slurm
with test.slurm
:
#!/bin/sh
#SBATCH --job-name=test
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --nodelist=node6
#SBATCH --partition=debug
echo 'test'
It gives me:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug up 1:00:00 1 drain node6
and I have to resume the node.
All nodes run Debian 9.8, use Infiniband and NIS. I have made sure that all nodes have the same config, version of packages and daemons running. So, I don't see what I am missing.
Seems like the issue was connected to the present NIS. Just needed to add to the end of /etc/passwd this line:
+::::::
and restart slurmd on the node:
/etc/init.d/slurmd restart