Search code examples
juliacluster-computingdistributed-computingslurmsbatch

Does SLURM sbatch Automatically Copy User Script Across Nodes?


Should SLURM (specifically sbatch) automatically copy the user script (not the job configuration script) to the cluster's compute nodes for execution? Upon executing the sbatch file from my login node, the output file is created on one of my compute nodes, but contains the following:

ERROR: could not open file /home/pi/slurm.jl
Stacktrace:
 [1] include at ./boot.jl:328 [inlined]
 [2] include_relative(::Module, ::String) at ./loading.jl:1105
 [3] include(::Module, ::String) at ./Base.jl:31
 [4] exec_options(::Base.JLOptions) at ./client.jl:287
 [5] _start() at ./client.jl:460

I'm running the batch script with sbatch julia.sbatch.

julia.sbatch:

#!/bin/bash
#SBATCH --nodes=4
#SBATCH --ntasks=4
#SBATCH --time=00:15:00
#SBATCH --output=julia.out
#SBATCH --job-name=julia-job

julia slurm.jl

Or should the script (slurm.jl) be located on shared storage accessible to all of the nodes?


Solution

  • Slurm will not copy files other than the submission script to the compute nodes. From the Quick Start User Guide:

    Slurm does not automatically migrate executable or data files to the nodes allocated to a job. Either the files must exists on local disk or in some global file system (e.g. NFS or Lustre).

    On most clusters, the /home directory is an NFS filesystem shared on each login and compute node.