Search code examples
arraysslurmsbatch

SLURM/sbatch: How to pass both a pointer to my data and the SLURM job ID to a job script


I am using a job array to process a large number of files, and I am able to pass a pointer from my array to the specific data file to be processed in the job script, but I want to also pass the specific SLURM job ID to the script and I can't seem to find the correct syntax to do so.

My array script currently looks like this:

#!/bin/bash
# ============================================
#SBATCH --job-name=sortdata
...
#SBATCH --output=down1count/sort_%A_%a.txt
#SBATCH --array=0-99
# ============================================
SIZE=30
INDEX_FILE="down1list.txt"

IDXZERO=$(( SLURM_ARRAY_TASK_ID * SIZE ))
IDXBEG=$(( IDXZERO + 1 ))
IDXEND=$(( IDXBEG + SIZE - 1 ))

for IDX in $(seq $IDXBEG $IDXEND); do
        DATA=$(sed -n ${IDX}p $INDEX_FILE)       
        sortfile1.bash $DATA      
done

where down1list.txt is just a list of the files in the directory created by ls down1/ >> down1list.txt.

The relevant part of my job script sortfile1.bash looks like this:

#!/bin/bash

for file in "down1/$@"; do
    gunzip $file 

    ###do some more stuff with the file####

done

What I would like to do is utilize my cluster's larger file system storage but it can only be accessed through my ${SLURM_JOB_ID}. Then I would mv the file before I unzip it in the above code. I've looked at a bunch of different questions and answers on this site and I can't seem to find anything that covers the syntax I am missing.

I believe by using $@ I ought to be able to access the ${SLURM_JOB_ID} but I can't figure out how to add it correctly to the sortfile1.bash $DATA line or how I would call it in my sortfile1.bash code. I tried just adding it directly like this: sortfile1.bash $DATA %A_%a but that doesn't seem to work.


Solution

  • The ${SLURM_JOB_ID} environment variable should be visible from all programs that are part of the job. So you should be able to simply use it directly in the code of sortfile1.bash.

    Should that be not the case, the usual approach would be to pass the variable as the first argument and use the shift keyword to skip it once its value has be stored in another variable, like this:

    #!/bin/bash
    
    JID=$1
    shift
    
    for file in "down1/$@"; do
        gunzip $file 
    
        ###do some more stuff with the file####
    
    done
    

    and call it like this in the submission script:

    sortfile1.bash $SLURM_JOB_ID $DATA      
    

    After shift is called, $@ will hold the list of arguments except for the first one, each being "shifted" $2 -> $1, $3->$2, etc.