Search code examples
bashloopsfor-loopparallel-processing

Bash loop on two sets of files simultaneously


I have two sets of files in the same directory :

1.bam 
2.bam
3.bam

and

1.txt
2.txt
3.txt

I need to run a command where 1.bam and 1.txt are interpreted together, and so on..

I tried looping through array indices :

bam=(*.bam)
ids=(*.txt)

for i in ${#bam[@]}; do     
 f=${bam[i]}    
 e=${ids[i]}
 samtools view -N $e $f 
done

And with parallel :

parallel --dry-run 'samtools view -N {1} {2} > {2.}.fasta' ::: *txt ::: *bam

With --dry-run , I can see that loops 3 times (3 sets of files * 3 loop) ; I just need one.

Any help?


Solution

  • First, ${#bam[@]} reports the number of elements in bam.
    As Jetchisel mentioned in the comments, what you want is "${!bam[@]}" which returns the indices.

    $: touch {1..3}.{bam,txt}
    $: ls
    1.bam  1.txt  2.bam  2.txt  3.bam  3.txt
    
    $: bam=(*.bam); ids=(*.txt)
    $: echo "${bam[@]} ${ids[@]}"
    1.bam 2.bam 3.bam 1.txt 2.txt 3.txt
    
    $: for i in ${#bam[@]}; do f=${bam[i]}; e=${ids[i]}; echo "i='$i' f='$f' e='$e'"; done
    i='3' f='' e=''
    
    $: echo "#:'${#bam[@]}' !:'${!bam[@]}'"
    #:'3' !:'0 1 2'
    
    $: for i in ${!bam[@]}; do f=${bam[i]}; e=${ids[i]}; echo "i='$i' f='$f' e='$e'"; done
    i='0' f='1.bam' e='1.txt'
    i='1' f='2.bam' e='2.txt'
    i='2' f='3.bam' e='3.txt'
    

    This does assume the arrays will be truly parallel. I leave any error checking on that to you unless you ask for a deeper dive into it.

    There are other ways to approach it. If you are assuming true parallel arrays, you can also back into it by iterating one and string-hacking the other.

    $: for b in ${bam[@]}; do echo "b='$b' t='${b%.bam}.txt'"; done
    b='1.bam' t='1.txt'
    b='2.bam' t='2.txt'
    b='3.bam' t='3.txt'
    

    But you can interpolate the filenames explicitly with brace expansion -

    $: for i in ${!bam[@]}; do echo "samtools view -N" $i.{bam,txt}; done
    samtools view -N 0.bam 0.txt
    samtools view -N 1.bam 1.txt
    samtools view -N 2.bam 2.txt
    

    So I'd do it something like this:

    for i in ${!bam[@]}
    do samtools view -N $i.{bam,txt} > $i.fasta 2> $i.log & 
    done
    

    As a habit I'd usually quote those vars, but in this case it's all integer indices, and quoting the {bam,txt} breaks it so you'd have to exclude it from your otherwise (circumstantially) unrequired quotes, so I've just left them off in this occasion.