Search code examples
awkcountunique

sum rows based on unique columns awk


I'm looking for a more elegant way to do this (for more than >100 columns):

awk '{a[$1]+=$4}{b[$1]+=$5}{c[$1]+=$6}{d[$1]+=$7}{e[$1]+=$8}{f[$1]+=$9}{g[$1]+=$10}END{for(i in a) print i,a[i],b[i],c[i],d[i],e[i],f[i],g[i]}'

Here is the input:

 a1 1   1   2   2
 a2 2   5   3   7
 a2 2   3   3   8
 a3 1   4   6   1
 a3 1   7   9   4
 a3 1   2   4   2

and output:

 a1 1 1 2 2
 a2 4 8 6 15
 a3 3 13 19 7

Thanks :)


Solution

  • I break the one-liner down into lines, to make it easier to read.

    awk '{n[$1];for(i=2;i<=NF;i++)a[$1,i]+=$i}
        END{for(x in n){
            printf "%s ", x
            for(y=2;y<=NF;y++)printf "%s%s", a[x,y],(y==NF?ORS:FS)
            }
        }' file
    

    this awk command should work with your 100 columns file.

    test with your file:

    kent$  cat f
    a1 1   1   2   2
    a2 2   5   3   7
    a2 2   3   3   8
    a3 1   4   6   1
    a3 1   7   9   4
    a3 1   2   4   2
    
    kent$  awk '{n[$1];for(i=2;i<=NF;i++)a[$1,i]+=$i}END{for(x in n){printf "%s ", x;for(y=2;y<=NF;y++)printf "%s%s", a[x,y],(y==NF?ORS:OFS)}}' f
    a1 1 1 2 2
    a2 4 8 6 15
    a3 3 13 19 7