I have a slurm script to run my python code:
#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=10G
#SBATCH --account=my_account
#SBATCH --qos=default
#SBATCH --time=2-00:00:00
###Array setup here
#SBATCH --array=1
#SBATCH --open-mode=truncate
#SBATCH --output=out_files/output.o
module purge
module load my_cluster
module load Miniconda3/4.9.2
eval "$(${EBROOTMINICONDA3}/bin/conda shell.bash hook)"
conda activate my_conda_env
cd /my_directory
python my_python_code.py -filename file_a.txt
This works, but at the moment, it just launches 1 job and uses file_a.txt
as an argument.
How can I launch 10 simultaneous jobs? I know I can use:
#SBATCH --array=1-10
but I want to use file_a.txt
as the argument for job 1, file_b.txt
as the argument for job 2 etc..
I would like to provide the lists of file names as a separate text file if possible, which is read by the slurm script.
As per the docs, the SLURM_ARRAY_TASK_ID
environment variable will be set to the (1-indexed) task ID. We can use this env var with sed
to get the Nth line from a list of files
my_files.txt
file_a.txt
file_b.txt
file_c.txt
Credit to this answer for the sed -n "xp"
command.
my_slurm_job.sh
#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=10G
#SBATCH --account=my_account
#SBATCH --qos=default
#SBATCH --time=2-00:00:00
###Array setup here
#SBATCH --array=1
#SBATCH --open-mode=truncate
#SBATCH --output=out_files/%a_output.o
module purge
module load my_cluster
module load Miniconda3/4.9.2
eval "$(${EBROOTMINICONDA3}/bin/conda shell.bash hook)"
conda activate my_conda_env
cd /my_directory
# Get the Nth line from my_files.txt
file_name=$(sed -n "${SLURM_ARRAY_TASK_ID}p" < my_files.txt)
python my_python_code.py -filename ${file_name}
Edited to add the Task ID to the output file name as per FlyingTeller's comment and the Slurm docs.