Search code examples
cluster-computingjob-schedulingslurmscientific-computing

Is there an output log/directory for sbatch error messages?


I'm running an sbatch script, and it successfully submits.

sbatch sbatch_script.sh

Submitted batch job 309376

But it does not show up when I run squeue -u <my_username> and no output is generated.

Is there a way to check what went wrong? For instance, are some environment variables set/output log I can check?


Solution

  • sbatch output is written to a slurm_{job_id}.out on the node. Instead, you can manually specify the output file to the local directory (just do myfile.out). Then, this file will appear right in the local directory and contain the standard error and output streams.

    My sbatch file:

    #!/bin/bash
    #SBATCH --gres=gpu:1
    #SBATCH --mem=12G
    #SBATCH -p gpu
    #SBATCH -c 8
    #SBATCH -n 1
    #SBATCH -o myfile.out
    source ~/anaconda3/bin/activate acl2020
    python  main.py --various args