What is the proper way of configuring jupyter on a server with slurm? After reading the docs, I am excecuting my python script through slurm like this (I am not sure if this is valid):
$ srun -n 1 --time=02:00:00 --cpus-per-task=14 --mem=64gb --part=cluster-job --gres=gpu:rtx2080ti:1 python ./src/main.py
Then, I get:
srun: job 2216877 queued and waiting for resources
When I do:
(base) [user@cluster ~]$ squeue -u user390284
I get:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2216877 cluster-job python user390284 PD 0:00 1 (Resources)
Is this the correct way of running my script? When I check with htop I do not see any process running. It seems my process is stuck. What is the correct way of using slurm with my script?
This is the correct way to request an interactive session on a compute node with an rtx2080ti GPU. But as Slurm tells you, your job has been submitted, and srun
will block until Slurm finds 14 CPUs, 64GB and a GPU available for you. Until then, squeue
will show your job as pending (PD
).
Running htop
will only show you the processes running on the login node, you will not see the process you submitted unless your cluster has only one node that happens to be the login node as well.