I have filenames like the following:
fastqs/hgmm_100_S1_L001_R1_001.fastq.gz
fastqs/hgmm_100_S1_L002_R1_001.fastq.gz
fastqs/hgmm_100_S1_L003_R1_001.fastq.gz
fastqs/hgmm_100_S1_L001_R2_001.fastq.gz
fastqs/hgmm_100_S1_L002_R2_001.fastq.gz
fastqs/hgmm_100_S1_L003_R2_001.fastq.gz
And I want to merge them into the groups shown above, allowing LXXX to be merged.
I can do it like the following:
cat fastqs/hgmm_100_S1_L00?_R1_001.fastq.gz > data/hgmm_100_S1_R1_001.fastq.gz
cat fastqs/hgmm_100_S1_L00?_R2_001.fastq.gz > data/hgmm_100_S1_R2_001.fastq.gz
But this requires me to hard code each of the file groups in. How can I set it up such that it merges all of the L values into a group and outputs a file that is the same as the input file names, just without the L?
Thanks, Jack
EDIT:
Sorry for not including this in original post, but what if I had something like:
fastqs/hgmm_100_S1_L001_R1_001.fastq.gz
fastqs/hgmm_100_S1_L002_R1_001.fastq.gz
fastqs/hgmm_100_S1_L003_R1_001.fastq.gz
fastqs/hgmm_200_S1_L001_R2_001.fastq.gz
fastqs/hgmm_200_S1_L002_R2_001.fastq.gz
fastqs/hgmm_200_S1_L003_R2_001.fastq.gz
(Only change is the very beginning (100 -> 200))
How would this work? Essentially I want to merge these files as long as all parts of the name except for L??? is identical.
If the pattern _L###_
exists only in that one part of the filename, you might try something like this:
#!/usr/bin/env bash
# Define an associative array. Requires bash 4+
declare -A a
# Use extended glob notation. Read the man page or this.
shopt -s extglob
# Collect the file patterns by writing indexes in the array.
for f in fastqs/*_L+([0-9])_*.fastq.gz; do
a["${f/_L+([0-9])_/_*_}"]=1
done
# And finally, gather your files.
for f in "${!a[@]}"; do
# Strip any existing directory part of the filename to build our target
target="data/${f##*/}"
# Concatenate files matching the glob into our intended target
cat $f > "${target/[*]_/}"
done
${!
lets us step through an array's indices rather than its values.