Search code examples
cluster-computingslurm

Question about the salloc command: Where does it execute?


I have a question about the salloc command in a cluster environment. When I execute the salloc command on the login node using salloc -n 1 --gpus=1 hostname, it still displays the hostname of the login node instead of the compute node's hostname. I expected to get the hostname of the compute node instead. Similarly, when I execute salloc -n 1 --gpus=1, it executes the /bin/bash on the login node with resources allocated.

My question is, if the command is not a shell like /bin/bash, does the salloc command have any effect? Will it only allocate resources and execute the command on the login node, without utilizing the compute nodes? It seems like salloc only utilizes the compute nodes when executing shell commands.

I would appreciate any clarification on this matter. Thank you.


Solution

  • With the default configuration, the salloc will only create an allocation, that is request resources and block until the resources are available, and start a shell on the login node, not on the allocated node. Then, in that shell, you can start a parallel program with srun or mpirun and the processes will run on the allocated nodes. Or you can run

    srun --pty /bin/bash -l
    

    and you will have a shell running on the allocated node.

    Alternatively, and this has been the official recommended way for some time, you can use the srun command directly (i.e. not in a salloc session) like this:

    srun -n 1 --gpus=1 --pty /bin/bash -l
    

    for the same result.

    This has confused users for a long time, especially since Slurm used to have a recommendation to define SallocDefaultCommand="srun -n1 -N1 --mem-per-cpu=0 --pty --preserve-env --cpu-bind=no --mpi=none $SHELL" in the slurm.conf which had the effect of starting an srun session automatically when the user ran the salloc command.

    In the newer versions, Slurm has an option LaunchParameters=use_interactive_step that is meant to become the default and will make salloc the command to use to get a shell on the first node of the allocation, while at the same time properly handling cgroups and tasksets.