BIG EDIT
I had a misunderstanding on the calculations so I'll need to re-do this.
Raw sample data:
2600,AEIOU-2600,stack,01,04/28/2015,C C R,S,1.07
2600,AEIOU-2600,stack,02,04/01/2015,C C S,S,10.65
2601,"over, L.P. - 00001",stack,01,04/01/2015,C C S,s,16.43
2601,"over, L.P. - 000001",stack,02,04/01/2015,D C S,s,17.16
2602,UEIA,stack,01,04/28/2015,C C R,s,10
2602,UEIA,stack,02,04/28/2015,C C R,s,20
Field $1:numeric
Field $2: name
Field $3: account
Field $4: invoice number
Field $5: date
Field $6: type
Field $7: state
Field $8: amount
Script:
#!/usr/local/bin/gawk -f
BEGIN {
FPAT = "\"[^\"]*\"|[^,]*"
OFS = ","
}
NR > 1 {
arr[$1 OFS $2 OFS $6]++
}
END {
for (key in arr)
print key, arr[key]
}
Output of script above:
2601,"over, L.P. - 000001",D C S,1
2601,"over, L.P. - 00001",C C S,1
2602,UEIA,C C R,2
2600,AEIOU-2600,C C S,1
I need to group based on field $1 and then field $6.
Once the script runs, it will generate data that looks like this:
2600,AEIOU-2600,C C R,1,1.07
2600,AEIOU-2600,C C S,1,10.65
2601,"over, L.P. - 00001",C C S,2,33.59
2602,UEIA,C C R,2,30
We see that 2600 had two different types so they each have their own line, and we see 2601 had two C C S combined are 33.59. Basically grouping on field $1 and then on field $6.
THEN I can multiply results for C C R and C C A at .02 (number of occurrences) and everything else at its total * .02*.0013.
2600,AEIOU-2600,C C R,1,1.07,0.0214
2600,AEIOU-2600,C C S,1 10.65,0.0002769
2601,"over, L.P. - 00001",C C S,2,33.59,0.00087334
2602,UEIA,C C R,2,30,0.00078
Field $1:numeric
Field $2: name
Field $3: type
Field $4: occurrences
Field $5 total
Field $5: calc_total
EDIT
#!/usr/local/bin/gawk -f
BEGIN {
FPAT = "\"[^\"]*\"|[^,]*"
OFS = ","
}
NR > 1 {
arr[$1 OFS $2 OFS $6 OFS $8]++
}
END {
for (key in arr)
print key, arr[key]
}
This is getting better but now I need a second group and group it by field $3. Right now the above script groups by $1 and then $5.
Is this what you're looking for?
$ cat tst.awk
BEGIN { FPAT="([^,]*)|(\"[^\"]+\")"; OFS="," }
{ print $0, $4 * .02 * ($3 ~ /C C [RA]/ ? 1 : .013) }
$ awk -f tst.awk file
30409,DPUMA - 147803,D C S,2,0.00052
30392,"SNI TIC , L.P. - 93001",C C S,175,0.0455
30425,QJEU - 98701,C C R,1,0.02
30397,"LSRNC , L.P. - 01236",D C S,124,0.03224
30432,SLRNT - 32014,C C S,1,0.00026
30425,QCET - 89701,C C R,10,0.2
30425,QLRU - 50701,C C R,30,0.6
Showing your expected output would be a big help.