Search code examples
csvawkgawkbsd

awk calculations based on field value


BIG EDIT

I had a misunderstanding on the calculations so I'll need to re-do this.

Raw sample data:

2600,AEIOU-2600,stack,01,04/28/2015,C C R,S,1.07
2600,AEIOU-2600,stack,02,04/01/2015,C C S,S,10.65
2601,"over, L.P. - 00001",stack,01,04/01/2015,C C S,s,16.43
2601,"over, L.P. - 000001",stack,02,04/01/2015,D C S,s,17.16
2602,UEIA,stack,01,04/28/2015,C C R,s,10
2602,UEIA,stack,02,04/28/2015,C C R,s,20

Field $1:numeric

Field $2: name

Field $3: account

Field $4: invoice number

Field $5: date

Field $6: type

Field $7: state

Field $8: amount

Script:

#!/usr/local/bin/gawk -f

BEGIN {
FPAT = "\"[^\"]*\"|[^,]*"

OFS = ","
}
NR > 1 {
arr[$1 OFS $2 OFS $6]++
}
END {
for (key in arr)
    print key, arr[key]
}

Output of script above:

2601,"over, L.P. - 000001",D C S,1
2601,"over, L.P. - 00001",C C S,1
2602,UEIA,C C R,2
2600,AEIOU-2600,C C S,1

I need to group based on field $1 and then field $6.

Once the script runs, it will generate data that looks like this:

2600,AEIOU-2600,C C R,1,1.07
2600,AEIOU-2600,C C S,1,10.65
2601,"over, L.P. - 00001",C C S,2,33.59
2602,UEIA,C C R,2,30

We see that 2600 had two different types so they each have their own line, and we see 2601 had two C C S combined are 33.59. Basically grouping on field $1 and then on field $6.

THEN I can multiply results for C C R and C C A at .02 (number of occurrences) and everything else at its total * .02*.0013.

2600,AEIOU-2600,C C R,1,1.07,0.0214
2600,AEIOU-2600,C C S,1 10.65,0.0002769
2601,"over, L.P. - 00001",C C S,2,33.59,0.00087334
2602,UEIA,C C R,2,30,0.00078

Field $1:numeric

Field $2: name

Field $3: type

Field $4: occurrences

Field $5 total

Field $5: calc_total

EDIT

#!/usr/local/bin/gawk -f

BEGIN {
FPAT = "\"[^\"]*\"|[^,]*"

OFS = ","
}
NR > 1 {
    arr[$1 OFS $2 OFS $6 OFS $8]++
}
END {
    for (key in arr)
        print key, arr[key]
}

This is getting better but now I need a second group and group it by field $3. Right now the above script groups by $1 and then $5.


Solution

  • Is this what you're looking for?

    $ cat tst.awk
    BEGIN { FPAT="([^,]*)|(\"[^\"]+\")"; OFS="," }
    { print $0, $4 * .02 * ($3 ~ /C C [RA]/ ? 1 : .013) }
    
    $ awk -f tst.awk file
    30409,DPUMA - 147803,D C S,2,0.00052
    30392,"SNI TIC , L.P. - 93001",C C S,175,0.0455
    30425,QJEU - 98701,C C R,1,0.02
    30397,"LSRNC , L.P. - 01236",D C S,124,0.03224
    30432,SLRNT - 32014,C C S,1,0.00026
    30425,QCET - 89701,C C R,10,0.2
    30425,QLRU - 50701,C C R,30,0.6
    

    Showing your expected output would be a big help.