Search code examples
bashparallel-processingcluster-computingsungridengineqsub

Running parallel Jobs


I am working on titan cluster which is consisting of 464 HP blade systems, two head nodes and a virtualized pool of login (submit) nodes.Each node has eight cores (two quad-core processors), and either 16GB (430 nodes) or 32GB (34 nodes) of memory. This provices 3712 compute cores and 8 TB of total RAM (memory)

The task is to use 2 parts namely R1,R2 for each Sample file.So each Sample file has R1,R2 types present in pairs which are to be used together to create a .sam file ,the reference is human_g1k_v37.fasta,and the software is BWA.I am using for loop for this,however I am not able to parallelize it on the cluster,as it would take very long for each sample to be run one at a time.below script is for running iteratively each pair one at a time(Works)

sourcedir=/sourcepath/
destdir=/destinationpath/


for fname in *_R1.fastq.gz
do
base=${fname%_R1*}
bwa-0.7.5a/bwa mem -t 8 human_g1k_v37.fasta "${base}_R1.fastq.gz" "${base}_R2.fastq.gz" >   "$destdir/${base}_R1_R2.sam" 

done

Since Using for loop would put each job on the same processor.I tried using "&" but its the same thing I believe and it does not seem to work in this case.I need all these processes to run on different processor parallel (might be an array job??) Below script does not work (for parallel processing)

sourcedir=/sourcepath/
destdir=/destinationpath/


for fname in *_R1.fastq.gz
do
base=${fname%_R1*}
bwa-0.7.5a/bwa mem -t 8 human_g1k_v37.fasta "${base}_R1.fastq.gz" "${base}_R2.fastq.gz" >   "$destdir/${base}_R1_R2.sam" &

done
wait

For more details,please see this earlier post of mine.Looping files in bash

Thanks


Solution

  • How about just submitting your job to your grid engine, it will handle the parallelization itself :

    #! /bin/sh
    ### Your script task.sh ###
    #$ -S /bin/sh
    
    bwa-0.7.5a/bwa mem -t 8 human_g1k_v37.fasta "${1}_R1.fastq.gz" "${1}_R2.fastq.gz" > "$destdir/${1}_R1_R2.sam"
    

    and then in your loop :

    for fname in *_R1.fastq.gz
    do
        base=${fname%_R1*}
        qsub task.sh ${base}
    done