Search code examples
arraysshellbioinformaticssungridengine

SGE array jobs with multiple inputs


So, I would like to get some help creating a shell script that will allow me to submit an array job where each individual job has multiple input files. An example of how I run array jobs that have one input per job is as follows:

DIR=/WhereMyFilesAre 
LIST=($DIR/*fastq) #files I want to process
INDEX=$((SGE_TASK_ID-1))
INPUT_FILE=${LIST[$INDEX]}

bwa aln ${DIR}/referencegenome.fasta $INPUT_FILE > ${INPUT_FILE%.fastq}.sai

So, basically what I want to do is something similar, except if I had 2 or more lists of files instead of one. And those files need to be paired properly. For instance, if I had File1_A.txt, File1_B.txt, File2_A.txt, File2_B.txt, and something that looked generically like

program input1 input2 > output

I would want the resulting jobs to have lines that look like

program File1_A.txt File1_B.txt > File1.txt

program File2_A.txt File2_B.txt > File2.txt

Solution

  • As you specify, if two input files are of fixed naming nomenclature except for the $INDEX then just use SGE_TASK_ID as INDEX in your job script:

    program File${SGE_TASK_ID}_A.txt File${SGE_TASK_ID}_B.txt > File${SGE_TASK_ID}.txt