Search code examples
pythonlinuxvariablesfor-looppicard

linux merge multiple files in picard


I have ten directories, and each directory has around 10-12 bam files. I need to use picard package to merge them together and I want to find a way to do it better.

basic command:
java -jar picard.jar MergeSamFiles \
  I=input_1.bam \
  I=input_2.bam \
  O=merged_files.bam

directory 1:
java -jar picard.jar MergeSamFiles \
  I=input_16.bam \
  I=input_28.bam \
  I=input_81.bam \
  I=input_34.bam \
  ... \
  ... \
  I=input_10.bam \
  O=merged_files.bam

directory 2:
java -jar picard.jar MergeSamFiles \
  I=input_44.bam \
  I=input_65.bam \
  I=input_181.bam \
  I=input_384.bam \
  ... \
  ... \
  I=input_150.bam \
  O=merged_files.bam

How can I add the Input by using variable if they are not in sequential, and I would like to do the for loop of those ten directories but they contain different number of bam files.

Should I use python or R to do it or keep on using shell script ? Please advice.


Solution

  • Why not use samtools?

    for folder in my_bam_folders/*; do
        samtools merge $folder.bam $folder/*.bam
    done
    

    In general, samtools merge can merge all the bam files in a given directory like this:

    samtools merge merged.bam *.bam
    

    EDIT: If samtools isn't an option and you have to use Picard, what about something like this?

    for folder in my_bam_folders/*; do
        bamlist=$(for f in $folder/*.bam; do echo -n "I=$f " ; done)
        java -jar picard.jar MergeSamFiles $bamlist O=$folder.bam
    done