Search code examples
csvunixawksummarize

Unix, group rows and sum values of columns from file.csv


I have this file.csv

"201707"|"51976551"|1|0|1|"20170702"
"201707"|"51955194"|1|0|0|"20170702"
"201707"|"51923555"|1|0|1|"20170702"
"201707"|"51976551"|1|0|1|"20170703"
"201707"|"51955194"|1|0|0|"20170703"
"201707"|"51923555"|1|0|1|"20170703"
"201707"|"51960597"|1|0|0|"20170703"

And my hope result is group by the number and sum the column 3, 4 and 5

"201707"|"51976551"|2|0|2
"201707"|"51955194"|2|0|0
"201707"|"51923555"|2|0|2
"201707"|"51960597"|1|0|0

I've tried with:

cat file.csv | awk -F"|" '
  { a[$2] += $3 }
  END {
    for (i in a) {
      printf "%s|%s\n", i, a[i];
    }
  }
'

And the result is:

"51976551"|2
"51955194"|2
"51923555"|2
"51960597"|1

Only shows the sum of third column, but I need 2 columns more. what should I do in this case?


Solution

  • Try:

    $ awk -F"|" '{ a[$1 OFS $2]+=$3; b[$1 OFS $2]+=$4; c[$1 OFS $2]+=$5 }
      END {
        for (i in a) {
          print i, a[i], b[i], c[i];
        }
      }
    ' OFS=\| file.csv
    "201707"|"51976551"|2|0|2
    "201707"|"51960597"|1|0|0
    "201707"|"51923555"|2|0|2
    "201707"|"51955194"|2|0|0
    

    How it works

    • -F"|"

      This sets the field separator on input to |.

    • a[$1 OFS $2]+=$3; b[$1 OFS $2]+=$4; c[$1 OFS $2]+=$5

      This keeps track of the totals of the third, fourth, and fifth columns.

    • END { for (i in a) { print i, a[i], b[i], c[i]; } }

      This prints out the results.

    • OFS=\|

      This tells awk to use | as the field separator on output.