Search code examples
bashcluster-computingjob-schedulingsungridengine

Submit SGE job array with random file names


I have a script that was kicking off ~200 jobs for each sub-analysis. I realized that a job array would probably be much better for this for several reasons. It seems simple enough but is not quite working for me. My input files are not numbered so I've following examples I've seen I do this first:

INFILE=`sed -n ${SGE_TASK_ID}p <pathto/listOfFiles.txt`

My qsub command takes in quite a few variables as it is both pulling and outputting to different directories. $res does not change, however $INFILE is what I am looping through.

qsub -q test.q -t 1-200 -V -sync y -wd ${res} -b y perl -I /master/lib/ myanalysis.pl -c ${res}/${INFILE}/configFile-${INFILE}.txt -o ${res}/${INFILE}/

Since this was not working, I was curious as to what exactly was being passed. So I did an echo on this and saw that it only seems to expand up to the first time $INFILE is used. So I get:

perl -I /master/lib/ myanalysis.pl -c mydirectory/fileABC/

instead of:

perl -I /master/lib/ myanalysis.pl -c mydirectory/fileABC/configFile-fileABC.txt -o mydirectory/fileABC/

Hoping for some clarity on this and welcome all suggestions. Thanks in advance!

UPDATE: It doesn't look like $SGE_TASK_ID is set on the cluster. I looked for any variable that could be used for an array ID and couldn't find anything. If I see anything else I will update again.


Solution

  • Assuming you are using a grid engine variant then SGE_TASK_ID should be set within the job. It looks like you are expecting it to be set to some useful variable before you use qsub. Submitting a script like this would do roughly what you appear to be trying to do: #!/bin/bash INFILE=$(sed -n ${SGE_TASK_ID}p <pathto/listOfFiles.txt) exec perl -I /master/lib/ myanalysis.pl -c ${res}/${INFILE}/configFile-${INFILE}.txt -o ${res}/${INFILE}/

    Then submit this script with

    res=${res} qsub -q test.q -t 1-200 -V -sync y -wd ${res} myscript.sh
    

    `