Search code examples
bashloopscommand-linemultiple-arguments

Looping over files in a folder for shell script with multiple inputs


Specifying multiple inputs for command line tool?

I am new to bash and am wanting to loop a command line program over a folder containing numerous files.

The script takes two input files (in my case, these differ in one field of the file name ("...R1" vs "...R2"). Running a single instance of the tool looks like this:

tool_name infile1 infile2 -o outfile_suffix

Actual example:

casper sample_name_R1_001.out.fastq sample_name_R2_001.out.fastq -o sample_name_merged

File name format:

DCP-137-5102-T1A3_S33_L001_R1_001.fastq
DCP-137-5102-T1A3_S33_L001_R2_001.fastq

The field in bold will vary between different pairs (e.g., 2000, 2110, 5100 etc...) with each pair distinguished by either R1 or R2.

I would like to know how to loop the script over a folder containing numerous pairs of matched files, and also ensure that the output (-o) gets the 'sample_name' suffix.

I am familiar with the basic for file in ./*.*; do ... $file...; done but that obviously won't work for this example. Any suggestions would be appreciated!


Solution

  • You want to loop over the R1's and derive the R2 and merged-file names from that, something like:

    for file1 in ./*R1*; do
        file2=${file1/R1/R2}
        merge=${file1#*R1}_merged
        casper ${file1} ${file2} -o ${merge}
    done
    

    Note: Markdown is showing the #*R1}_merged as a comment -- it's not