After finishing an analysis I get a table with a lot of columns and rows. Also, as a new table is written, the number of lines/cols can vary so I am not able to predict how many of each there will be. Every row has an index in column one but those indexes can repeat through the table. So what I want is a grep/awk/bash way to retrieve all those lines with the same index and sum all columns to get just one line with the summed values. As an illustration:
table
index,sampleA,sampleB,sampleC
nana,22,12,4
baba,47,4,5
nana,1,5,9
nana,7,5,8
after parsing
index,sampleA,sampleB,sampleC
nana,30,22,21
baba,47,4,5
I would appreciate so much if you could help me on that. Many thanks.
A little long winded, but something like this will do the job:
awk -F"," 'BEGIN{OFS=FS} NR==1{print $0; next} NR>1{sampleA[$1]+=$2; sampleB[$1]+=$3; sampleC[$1]+=$4}END{for (sample in sampleA){print sample, sampleA[sample], sampleB[sample], sampleC[sample]}}' yourfile
Explanation:
-F","
BEGIN{OFS=FS}
NR==1{print $0; next}
NR>1{sampleA[$1]+=$2; sampleB[$1]+=$3; sampleC[$1]+=$4}
END{for (sample in sampleA){print sample, sampleA[sample], sampleB[sample], sampleC[sample]}}