Search code examples
lsf

How to get job ID from job array within for loop?


I am running a greedy feature selection algorithm, and I am attempting to use job arrays to explore parallelization.

The idea is that we have three steps that depend on the previous step:

  • Step 1: Setup for iteration i

  • Step 2: Fit models at iteration i

  • Step 3: Find best model at iteration i

Because you need all the models (>10) to have finished training before starting step 3, plain old job chaining is not optimal. So I am trying to use job arrays, which do exactly what I want: only when all my models are fitted do I move to step 3.

However, I am having trouble setting up the dependency. I was told that the dependency for a whole job array needs to be the job ID (which is a number) and not the job name (e.g. runSetup$n_subject$i).

So: how do I get the job ID from the whole job array ? Or better yet: how to best set a dependency for a whole job array ?

This answer is very interesting, but doesn't tell me how to best set a dependency when my job array contains 10 or more jobs.

#!/bin/bash

# Subject to consider
n_subject=$1 # takes in input arguments from command line.
cohort=$2
priors_and_init=$3
nparam=16

for ((i = 1; i <= $nparam; i++)); do
    # Run setup
    if [[ $i -eq 1 ]]; then
      bsub -J "runSetup$n_subject$i" matlab  -singleCompThread -nodisplay -r "setup_greedy_forward($n_subject,$cohort, $priors_and_init, $i)"
    else
      last_iter=$((i-1))
      bsub -J "runSetup$n_subject$i" -w "done(saveBest$n_subject$last_iter)" matlab  -singleCompThread -nodisplay -r "setup_greedy_forward($n_subject,$cohort, $priors_and_init, $i)"
    fi

    # Fit models
    max_sim=$((nparam-i+1))
    bsub -W 08:00 -J "fitDCMs$n_subject[1-$max_sim]" -w "done(runSetup$n_subject$i)" -R "rusage[mem=16000]" matlab  -singleCompThread -nodisplay -r "fit_dcm_greedy_forward($n_subject,$cohort, $priors_and_init, \$LSB_JOBINDEX)"

    # Extracting the job ID from the fitDCMs jobs
    # Then: For all trained DCMs, get the best model and save it
    JOBID=$(get_jobid bsub -W 08:00 -J "fitDCMs$n_subject[1-$max_sim]" -w "done(runSetup$n_subject$i)" -R "rusage[mem=16000]" matlab  -singleCompThread -nodisplay -r "fit_dcm_greedy_forward($n_subject,$cohort, $priors_and_init, \$LSB_JOBINDEX)" 2> /dev/null)
    if [ -n "$jobid" ]; then
        bsub -J "saveBest$n_subject$i" -w "numdone($JOBID,*)" matlab -singleCompThread -nodisplay -r "save_best_model($n_subject,$cohort, $priors_and_init, $i)"
    fi
done


The output I am getting:

MATLAB job.
Job <94564566> is submitted to queue <normal.24h>.
MATLAB job.
Job <94564567> is submitted to queue <normal.24h>.
MATLAB job.
saveBest121: No matching job found. Job not submitted.
MATLAB job.
runSetup122: No matching job found. Job not submitted.
[…]

Solution

  • After searching a bit, I found a way to get the job ID.

    JOBID=$(bsub command1 | awk '/is submitted/{print substr($2, 2, length($2)-2);}')
    if [ -n "$JOBID" ]; then
        bsub -w "numdone($JOBID,*)" command2
    fi
    

    The first line submits the job and extracts its job ID.

    This answer was found here.