Search code examples
bashfastq

Bash selecting files from directory listing stored in a string


I have lets say 50 folders each with a different number of pairs of files that are the input for a command line tool.

#for f in ./*shuf; do #lists all the directories
    #FILES=${f}/*.fastq #to get all the fastq files in the directory

    FILES="./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_f.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121103_1_f.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121103_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121214_1_f.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121214_1_r.fastq"

What I need to do is divide the files into their respective pairs (one r and one f for each file name), into something that looks like this (for a single pair):

echo $PAIR

./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_f.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_r.fastq

I will use this as an input which needs to be in this format

 (`basename ${PAIR%_*}; $PAIR`):
 C115_7.121017_1 ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_f.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_r.fastq

And then loop through all the pairs.

I was attempting to do this with:

IFS=' ' read -ra ADDR <<< "$FILES"
echo "${ADDR[ ]}"

but I'm stuck getting an error ${ADDR[ ]}: bad substitution. Could you please include an explanation of the method as I really want to learn.

EDIT:

To clarify a bit:

this is somewhat what I am looking for the output to be:

 IFS=' ' read -ra ADDR <<< "$FILES"
 pairs="${ADDR[@]}"
 for afile in ${pairs}; do bfile=${afile%_*}; echo ${bfile}_r.fastq ${bfile}_f.fastq; done

But without the duplicating:

./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_f.fastq
./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_f.fastq
./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121103_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121103_1_f.fastq
./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121103_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121103_1_f.fastq
./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121214_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121214_1_f.fastq
./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121214_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121214_1_f.fastq

Solution

  • shopt -s nullglob
    
    KEYS=()
    declare -A MAP=()
    
    for D in ./*shuf; do
        for F in "$D"/*.fastq; do
            KEY=${F##*/} KEY=${KEY%_*}
            [[ -z ${MAP[$KEY]} ]] && KEYS+=("$KEY")
            MAP[$KEY]+=" $F"
        done
        for KEY in "${KEYS[@]}"; do
            echo "${KEY}${MAP[$KEY]}"
        done
        KEYS=()
        MAP=()
    done
    

    Or

    shopt -s nullglob
    
    KEYS=()
    declare -A MAP=()
    
    for D in ./*shuf; do
        for F in "$D"/*.fastq; do
            KEY=${F##*/} KEY=${KEY%_*}
            [[ -z ${MAP[$KEY]} ]] && KEYS+=("$KEY")
            MAP[$KEY]+=" $F"
        done
    done
    
    for KEY in "${KEYS[@]}"; do
        echo "${KEY}${MAP[$KEY]}"
    done
    

    You need Bash 4.0 or newer for it. Good luck.