Search code examples
awkprocess-substitution

piping awk outputs into another awk command


I have a tab delimited file as such:

1   68082   68082   0.003   0.0984  0.0845750981305074  1
1   69428   69428   0.0015  0.0497  0.04367900961171486 1
1   69761   69761   0.0045  0.1034  0.09005130799195755 1
1   69899   69899   0.0106  0.001   0.012825357055808327    1
1   70352   70352   0.0356  0.002   0.04128979333631639 1
1   136113  136113  0.0015  0.0278  0.02540996544374495 1
1   138396  138396  0.0008  0.0089  0.008567211104293392    1
1   872352  872352  0.4955  0.2803  0.48119634372979975 1
1   872467  872467  0.0121  0.004   0.01705890110859077 1
1   872564  872564  0.0015  0.002   0.0034277132094182  1

I would like to get the sum of column 6:

awk -F'[\t]' '{ total += $6 } END { print total }' file

And the sum of column 7:

awk -F'[\t]' '{ total += $7 } END { print total }' file

And then divide the sum of column 6 by the sum of column 7.

Is it possible to do this entirely in one command? Something like process substitution in bash?


Solution

  • Following awk may help you in same if we need to print only the 6th and 7th fields sum.

    awk '{sum6+=$6;sum7+=$7} END{print "Sum of column 6:",sum6 RS "sum of column 7:",sum7;print "Divide of sum6 and sum7",sum6/sum7}' Input_file
    

    Output will be as follows.

    Sum of column 6: 0.808081
    sum of column 7: 10
    Divide of sum6 and sum7 0.0808081
    

    EDIT: Solution2: Adding a non-one liner form of solution along with Sir Ed Morton's suggestion which includes logic in case sum of 7th column is 0 then it will not throw error in output.

    awk '{
      sum6+=$6;
      sum7+=$7
    }
    END{
      print "Sum of column 6:",sum6 ORS "sum of column 7:",sum7;print "Divide of sum6 and sum7",sum7?sum6/sum7:"you are trying to divide with a 0, seems sum7 is zero."
    }'   Input_file