Search code examples
bashshellpipedividefastq

Divide output of wc-l by 4 in a for loop in bash?


I'm trying to write a for loop that unzips fastq.gz files that contain R1 in the file name, determines # of lines in each file, and divides # of lines by 4. Ideally I could also write this into a txt file with two columns (file name and # of lines/4).

This loop unzips R1 fastq files and deterimnes # of lines in each file but does not divide by 4 (or save output into a txt file).

for i in $(ls ./*R1*);
do
gzcat ./$i | wc -l
done;

Other posts on here suggest using bc to divide in bash, but I haven't been able to integrate this into a loop.


Solution

  • You never use for i in $(ls anything), see Bash Pitfalls #1. Your loop will fail for filenames with spaces or any other special characters. For most circumstances, you simply iterate over the files with for i in path/*; do ..., but understand that can fail if the filenames contain the '\n' character as part of the name. The optimal for handling all filenames is to use find as while read -r name; do ... done < <(find path -type f -name "*.gz") (note process substitution, < <(...) is a bash only construct, pipe to the loop if using POSIX shell)

    Next, to write the name and number of lines / 4 to a new file, wrap your entire loop in a new scope between { .... } and simply redirect all output at once to the new file.

    You should also add validations to check if the file is a directory ending in gz and skip any found, as well as skipping any empty file (zero file size)

    If you it altogether, you could do something like:

    {
    for i in R1/*.gz; do
      [ -d "$i" ] && continue                 ## skip any directories
      [ -s "$1" ] && continue                 ## skip empty files
      nlines=$(gzcat "$i" | wc -l)            ## get number of lines
      printf "%s\t%s\n" "$i" $((nlines / 4))  ## output name, nlines / 4
    done
    } > newfile         ## redirect all output to newfile
    

    (output is written with a tab character "\t" separating the name and number / 4 -- adjust as desired)

    Look things over and let me know if you have any questions.