Search code examples
linuxbashshslurmsbatch

How to make a directory of current time as a part of SLURM's log path


I hava a .slurm file which can be run in Linux GPU Cluster. The file is like:

#!/bin/bash
#SBATCH -o ./myrepo/output.log
#SBATCH -J jobname
#SBATCH --gres=gpu:V100:1
#SBATCH -c 5
source /home/LAB/anaconda3/etc/profile.d/conda.sh
conda activate cuda9.1
CUDA_VISIBLE_DEVICES=0 python train.py

Now I want add a folder in log path. Maybe I will look like:

#!/bin/bash
#SBATCH -o ./myrepo/**currenttime**/output.log
#SBATCH -J jobname
#SBATCH --gres=gpu:V100:1
#SBATCH -c 5
source /home/LAB/anaconda3/etc/profile.d/conda.sh
conda activate cuda9.1
CUDA_VISIBLE_DEVICES=0 python train.py

I have tried:

#!/bin/bash
time=`date +%Y%m%d-%H%M%S`
#SBATCH -o ./myrepo/${time}/output.log
#SBATCH -J jobname
#SBATCH --gres=gpu:V100:1
#SBATCH -c 5
source /home/LAB/anaconda3/etc/profile.d/conda.sh
conda activate cuda9.1
CUDA_VISIBLE_DEVICES=0 python train.py

But failed. It seems that #SBATCH should be next to #!/bin/bash.

And the follow one succeeds, but with it I can't run more than one job at one time.

#!/bin/bash
#SBATCH -o ./myrepo/output.log
#SBATCH -J jobname
#SBATCH --gres=gpu:V100:1
#SBATCH -c 5
source /home/LAB/anaconda3/etc/profile.d/conda.sh
conda activate cuda9.1
time=`date +%Y%m%d-%H%M%S`
CUDA_VISIBLE_DEVICES=0 python train.py
cp ./myrepo/output.log ./myrepo/${time}/output.log

How can I solve this problem?


Solution

  • It works for me.

    #!/bin/bash
    #SBATCH -o ./myrepo/output_%j.log
    #SBATCH -J jobname
    #SBATCH --gres=gpu:V100:1
    #SBATCH -c 5
    time=`date +%Y%m%d-%H%M%S`
    mkdir ./myrepo/${time}
    source /home/LAB/anaconda3/etc/profile.d/conda.sh
    conda activate cuda9.1
    CUDA_VISIBLE_DEVICES=0 python train.py
    mv ./myrepo/output_$SLURM_JOB_ID.log ./myrepo/${time}/output.log
    

    #SBATCH -o ./myrepo/output_%j.log means that your output file is named after output_jobid.log, in SBATCH you can use %j to replace jobid. But in bash, you have to use $SLURM_JOB_ID, and the last line is to move the log into folder(current time). In this way you can run more than one jobs and results are in separate folders.