Search code examples
pythonshellslurm

How can I make my Slurm script loop over a list of file names?


I have a slurm script to run my python code:

#!/bin/bash -l                                                                                                    
#SBATCH --nodes=1                                                                                                 
#SBATCH --ntasks=1                                                                                                
#SBATCH --cpus-per-task=1                                                                                         
#SBATCH --mem=10G                                                                                                 
#SBATCH --account=my_account                                                                                 
#SBATCH --qos=default                                                                                           
#SBATCH --time=2-00:00:00                                                                                         
###Array setup here                                                                                               
#SBATCH --array=1                                                                                                 
#SBATCH --open-mode=truncate                                                                                      
#SBATCH --output=out_files/output.o                                                                              

module purge
module load my_cluster
module load Miniconda3/4.9.2

eval "$(${EBROOTMINICONDA3}/bin/conda shell.bash hook)"

conda activate my_conda_env

cd /my_directory

python my_python_code.py -filename file_a.txt

This works, but at the moment, it just launches 1 job and uses file_a.txt as an argument.

How can I launch 10 simultaneous jobs? I know I can use:

#SBATCH --array=1-10  

but I want to use file_a.txt as the argument for job 1, file_b.txt as the argument for job 2 etc..

I would like to provide the lists of file names as a separate text file if possible, which is read by the slurm script.


Solution

  • As per the docs, the SLURM_ARRAY_TASK_ID environment variable will be set to the (1-indexed) task ID. We can use this env var with sed to get the Nth line from a list of files

    my_files.txt

    file_a.txt
    file_b.txt
    file_c.txt
    

    Credit to this answer for the sed -n "xp" command.

    my_slurm_job.sh

    #!/bin/bash -l                                                                                                    
    #SBATCH --nodes=1                                                                                                 
    #SBATCH --ntasks=1                                                                                                
    #SBATCH --cpus-per-task=1                                                                                         
    #SBATCH --mem=10G                                                                                                 
    #SBATCH --account=my_account                                                                                 
    #SBATCH --qos=default                                                                                           
    #SBATCH --time=2-00:00:00                                                                                         
    ###Array setup here                                                                                               
    #SBATCH --array=1                                                                                                 
    #SBATCH --open-mode=truncate                                                                                      
    #SBATCH --output=out_files/%a_output.o                                                                              
    
    module purge
    module load my_cluster
    module load Miniconda3/4.9.2
    
    eval "$(${EBROOTMINICONDA3}/bin/conda shell.bash hook)"
    
    conda activate my_conda_env
    
    cd /my_directory
    
    # Get the Nth line from my_files.txt
    file_name=$(sed -n "${SLURM_ARRAY_TASK_ID}p" < my_files.txt)
    
    python my_python_code.py -filename ${file_name}
    

    Edited to add the Task ID to the output file name as per FlyingTeller's comment and the Slurm docs.