Search code examples
linuxshellawksed

How to change the format of 1st and 5th date column from 28-MAR-2024 to 03/28/2024


Example Input File

AsofDate,Portfolio,Shock,CUSIP,DATE,BALANCE,INTEREST,SCHED_PRIN,UNSCHED_PRIN,LOSS
28-MAR-2024,INV,DN50,28623RAE8,22-APR-2024,25000000,447380,0,0,0
28-MAR-2024,INV,DN50,28623RAE8,22-JUL-2024,25000000,413892,0,0,0
28-MAR-2024,INV,DN50,28623RAE8,21-OCT-2024,25000000,399624.25,0,0,0
28-MAR-2024,INV,DN50,28623RAE8,21-JAN-2025,25000000,386169.5,0,0,0

Lets say above is my input file, I would like to change the format of 1st (AsofDate) and 5th (DATE) column date form from dd-MMM-YYYY (Example: 28-MAR-2024) to mm/dd/YYYY (Example: 03/28/2024).

Tried the below command for 1st column,

awk -F',' '{gsub(/"/,"",$1); cmd="date -d\""$1"\" +\\\"%m/%d/%Y\\\""; cmd |getline $1; close(cmd); print >FILENAME }' OFS=, $FILE

The problem with this command is, its taking more time to process. The file size is 26000KB and I need to iterate for all the files under a directory. How to modify this command to achieve better performance or efficiency?


Solution

  • OP's current code takes so long to run because of the repeated calls out to the OS to have date reformat the date fields. We can eliminate these expensive OS/date calls with some simple splitting and reformatting of the date fields.

    Assumptions:

    • data fields do not contain embedded commas, ie, commas only show up as delimiters

    One awk idea:

    awk '
    
    function reformat(_date, c) {
        split(_date,c,"-")                             # split _date on "-", place parts in array c[]
        return map[c[2]] "/" c[1] "/" c[3]             # return parts in desired format
    }
    
    BEGIN { FS = OFS = ","
            split("JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC",a)
            for (i in a)
                map[a[i]] = sprintf("%02d",i)          # convert a[1]=JAN to map[JAN]=01
          }
    
    FNR>1 { $1 = reformat($1)                          # skip header line; replace $1 with new date format
            $5 = reformat($5)                          # replace $5 with new date format
          }
    1                                                  # print current line
    ' sample.file
    

    This generates:

    AsofDate,Portfolio,Shock,CUSIP,DATE,BALANCE,INTEREST,SCHED_PRIN,UNSCHED_PRIN,LOSS
    03/28/2024,INV,DN50,28623RAE8,04/22/2024,25000000,447380,0,0,0
    03/28/2024,INV,DN50,28623RAE8,07/22/2024,25000000,413892,0,0,0
    03/28/2024,INV,DN50,28623RAE8,10/21/2024,25000000,399624.25,0,0,0
    03/28/2024,INV,DN50,28623RAE8,01/21/2025,25000000,386169.5,0,0,0