I'd like to use gawk to group some data and do calculations on a csv file.
Raw sample data:
2600,AEIOU-2600,stack,2,04/01/2015,C C S,S,10.65
2600,AEIOU-2600,stack,3,04/20/2015,C C R,S,100
2600,AEIOU-2600,stack,1,04/28/2015,C C R,S,1.07
2600,AEIOU-2600,stack,4,04/29/2015,C C R,S,200
2601,"over, L.P. - 00001",stack,0,04/01/2015,C C S,s,50
2601,"over, L.P. - 00001",stack,1,04/01/2015,C C S,s,16.43
2601,"over, L.P. - 00001",stack,2,04/10/2015,D C S,s,17.16
2602,UEIA,stack,2,04/19/2015,C C S,s,500
2602,UEIA,stack,2,04/20/2015,C C S,s,50
2602,UEIA,stack,1,04/28/2015,C C S,s,10
2602,UEIA,stack,2,04/28/2015,C C S,s,30
2602,UEIA,stack,2,04/29/2015,C C S,s,40
2603,EDM,stack,1,04/01/2015,A,S,100
2603,EDM,stack,1,04/03/2015,A,S,100
2603,EDM,stack,1,04/04/2015,A,S,300
2603,EDM,stack,1,04/05/2015,A,S,600
2603,EDM,stack,1,04/06/2015,A,S,50
Field $1: numeric
Field $2: name
Field $3: account
Field $4: invoice number
Field $5: date
Field $6: type
Field $7: state
Field $8: amount
Each row in the csv file needs to be grouped by $1, but also all like fields in field $6 need to be added together only if they are NOT C C R or A.
Expected output:
2600,AEIOU-2600,C C R,3
2600,AEIOU-2600,C C S,1,10.65
2601,"over, L.P. - 00001",C C S,2,66.43
2601,"over, L.P. - 000001",D C S,1,17.16
2602,UEIA,C C S,5,630
2603,EDM,A,4
As wee see, all the like types are grouped, counted, and added together. The only exception is if the type is A or C C R, those are just counted.
Field $1: numeric
Field $2: name
Field $3: type
Field $4: count of type
Field $5: addition of types
I'll have to use gawk because some of the name fields contain double quotes.
I have this but it doesn't group by $1 AND $6
#!/usr/local/bin/gawk -f
BEGIN {
FPAT = "\"[^\"]*\"|[^,]*"
OFS = ","
}
NR > 1 {
arr[$1 OFS $2 OFS $6 OFS $8]++
}
END {
for (key in arr)
print key, arr[key]
}
Output:
2600,AEIOU-2600,C C R,1.07,1
2600,AEIOU-2600,C C R,100,1
2600,AEIOU-2600,C C R,200,1
2601,"over, L.P. - 00001",C C S,16.43,1
2601,"over, L.P. - 00001",C C S,50,1
2601,"over, L.P. - 00001",D C S,17.16,1
2602,UEIA,C C S,10,1
2602,UEIA,C C S,30,1
2602,UEIA,C C S,40,1
2602,UEIA,C C S,50,1
2602,UEIA,C C S,500,1
2603,EDM,A,100,2
2603,EDM,A,300,1
2603,EDM,A,50,1
2603,EDM,A,600,1
$ cat tst.awk
BEGIN { FPAT="([^,]*)|(\"[^\"]+\")"; OFS="," }
{
cnt[$1][$6]++
sum[$1][$6]+=$NF
name[$1][$6]=$2
}
END {
for (numeric in cnt) {
for (type in cnt[numeric]) {
print numeric, name[numeric][type], type, cnt[numeric][type] (type ~ /^(C C R|A)$/? "" : OFS sum[numeric][type])
}
}
}
$ awk -f tst.awk file
2600,AEIOU-2600,C C R,3
2600,AEIOU-2600,C C S,1,10.65
2601,"over, L.P. - 00001",D C S,1,17.16
2601,"over, L.P. - 00001",C C S,2,66.43
2602,UEIA,C C S,5,630
2603,EDM,A,5