Search code examples
linuxbashshellslurmsbatch

How to generate different scripts to run on each directory in linux?


I have a directory main in which there are around 100 directories. For example it looks like below:

main
 |__ test_1to50000
 |__ test_50001to60000
 |__ test_60001to70000
 |__ test_70001to80000
 |__ test1.sh

I have a sbatch script test1.sh to run on the first directory.

#!/bin/bash

#SBATCH --job-name=sbatchJob   
#SBATCH --cpus-per-task=16       
#SBATCH --mem-per-cpu=8G    
#SBATCH --time=1-00:00:00
#SBATCH --qos=1day
if [ -f ~/.bashrc ] ; then
    . ~/.bashrc
fi

module load Perl/5.28.0-GCCcore-8.2.0

perl path/to/software --cpu 16 --run /path/to/test_1to50000 command /path/to/test_1to50000/software.`date +"%m_%d_%y_%H-%M-%S"`.log

I have 100 directories, so I would like to create each script for each directory and submit the scripts. How to generate sbatch scripts for all the other directories like above?


Solution

  • Your best option is to use a job array with a script like this:

    #!/bin/bash
    #SBATCH --array=0-3   # 3 == number of dirs - 1
    #SBATCH --job-name=sbatchJob   
    #SBATCH --cpus-per-task=16       
    #SBATCH --mem-per-cpu=8G    
    #SBATCH --time=1-00:00:00
    #SBATCH --qos=1day
    if [ -f ~/.bashrc ] ; then
        . ~/.bashrc
    fi
    
    module load Perl/5.28.0-GCCcore-8.2.0
    DIRS=(main/*/)    # This array will hold all directories
    CURRDIR="${DIRS[$SLURM_ARRAY_TASK_ID]}" # This is the directory taken care of by the current job
    
    perl path/to/software --cpu 16 --run "$CURRDIR" command "$CURRDIR"/software.`date +"%m_%d_%y_%H-%M-%S"`.log
    

    This will create a job array with one job per directory. You will need to setup the correct amount of jobs in the array to correspond to the number of directories. But then, with the array, you can manage all the jobs with a single command, get a single email when all jobs are finished, and it eases the work of the scheduler a lot.