I have a tab delimited file as such:
1 68082 68082 0.003 0.0984 0.0845750981305074 1
1 69428 69428 0.0015 0.0497 0.04367900961171486 1
1 69761 69761 0.0045 0.1034 0.09005130799195755 1
1 69899 69899 0.0106 0.001 0.012825357055808327 1
1 70352 70352 0.0356 0.002 0.04128979333631639 1
1 136113 136113 0.0015 0.0278 0.02540996544374495 1
1 138396 138396 0.0008 0.0089 0.008567211104293392 1
1 872352 872352 0.4955 0.2803 0.48119634372979975 1
1 872467 872467 0.0121 0.004 0.01705890110859077 1
1 872564 872564 0.0015 0.002 0.0034277132094182 1
I would like to get the sum of column 6:
awk -F'[\t]' '{ total += $6 } END { print total }' file
And the sum of column 7:
awk -F'[\t]' '{ total += $7 } END { print total }' file
And then divide the sum of column 6 by the sum of column 7.
Is it possible to do this entirely in one command? Something like process substitution in bash?
Following awk
may help you in same if we need to print only the 6th and 7th fields sum.
awk '{sum6+=$6;sum7+=$7} END{print "Sum of column 6:",sum6 RS "sum of column 7:",sum7;print "Divide of sum6 and sum7",sum6/sum7}' Input_file
Output will be as follows.
Sum of column 6: 0.808081
sum of column 7: 10
Divide of sum6 and sum7 0.0808081
EDIT: Solution2: Adding a non-one liner form of solution along with Sir Ed Morton's suggestion which includes logic in case sum of 7th column is 0
then it will not throw error in output.
awk '{
sum6+=$6;
sum7+=$7
}
END{
print "Sum of column 6:",sum6 ORS "sum of column 7:",sum7;print "Divide of sum6 and sum7",sum7?sum6/sum7:"you are trying to divide with a 0, seems sum7 is zero."
}' Input_file