Search code examples
awkcurrency-formatting

gawk FIELDWIDTHS convert to CSV, but convert cents to dollars for some fields


I am converting a transaction detail file received from the bank from fixed width into a CSV file.

 gawk '$1=$1' FIELDWIDTHS='1 8 12 23 6 6 6 7 13 9 3 10 1 2 3 1 3 1 1 2 6 3 12 3 3 1 2 9 19 140  ' OFS=, $f  >> $csvfilename

Some of the fields are currency fields and listed in cents. For example, the second last field will have a value "1234567". However, the value I want output is "12345.67".

I am looking for how I can do that conversion. Something like

FIELDWIDTHS='1 8 12 23 6 6 6 7.2 13 '

As an example. the following input is received

1000000000000027302080000000541282******876421111821111821111800648300000000021687000000000PU 831712 DR90 020N000000000323132106663755
0000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

The resulting CSV file is

1,00000000,000002730208,0000000448387******6098,211115,211116,211116,0120456,0000000051580,000000000,PU ,754045 , ,DR,90 ,0,20N,0,0,00,000003,211,31912995587 , , 0,0,00,000000000,0000000000000000000,0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

The 8th field has a value "0120456". I would like the value in the resulting comma delimited file to read "01204.56"


Solution

  • Here is a solution that I think you can use, or adapt.

    Bash script:

    #!/bin/bash
    
    line="1000000000000027302080000000541282******876421111821111821111800648300000000021687000000000PU 831712     DR90 020N000000000323132106663755 0000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
    
    echo "$line" | awk -f so.awk
    

    so.awk

    BEGIN {
        FIELDWIDTHS = "1 8 12 23 6 6 6 7 13 9 3 10 1 2 3 1 3 1 1 2 6 3 12 3 3 1 2 9 19 140"
    }
    
    {
        # Print first 7 fields
        for (i=1; i<=7; i++) {
            printf "%s,", $i
        }
    
        # Print 8th field
        firstXchars = substr($8, 0, length($8)-2)
        lasttwochars = substr($8, length($8)-1)
        printf "%s.%s,", firstXchars, lasttwochars
    
        # Print the rest of fields, except the last one (I do not want the , for the last one)
        for (j=9; j<=NF-1; j++) {
            printf "%s,", $j
        }
    
        # Print the last field
        print $NF
    }
    

    What it does

    • print the first 7 fields separated by , , no changes
    • for the 8th field, extract the characters except the last two
    • for the 8th field, extract the last two characters
    • print field 8 with (except last two).(last two),
    • print all remaining fields but the last one separated by ,
    • print the last field

    Output

    $ ./so.bash 
    1,00000000,000002730208,0000000541282******8764,211118,211118,211118,00648.30,0000000021687,000000000,PU ,831712    , ,DR,90 ,0,20N,0,0,00,000003,231,32106663755 ,000,000,0,00,000000000,0000000000000000000,000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
                                                                         ^^^^^^^^
    

    Note ^^^^^^^^ is not part of the output, I just highlight field 8

    With the method highlighted here for managing field 8, you can adapt the script to fit your actual data and output requirements.