Search code examples
postgresqlcsvunixawkksh

Formatting a date column in a CSV file in 'YYYYMMDDHH24MISS' to 'YYYYMMDD HH24MISS' format in ksh


I wrote a script which takes values from a CSV file and inserts in into a postgres table, but was facing a problem. The date columns in the CSV file are in 'YYYYMMDDHH24MISS' format, and the same column in the table are defined as Timestamp, and when inserting using the \copy command, I am getting the below error,

ERROR: 22008: date/time field value out of range: "20000125200942"

So I tried few things and figured out if I have the entries in 'YYYYMMDD HH24MISS' format, the insert works fine.

Now the problem is I don't know how to format the entries of the CSV file in ksh.

I tried using the gsub command with awk, but it didn't seem to work.

awk 'BEGIN {FS=OFS=","} {gsub(/^([0-9]{4})([0-9]{2})([0-9]{2})([0-9]{2})([0-9]{2})([0-9]{2})$/, "\\1\\2\\3 \\4\\5\\6", $1)} 1' input_file.csv > output_file.csv

All your help would be appreciated 🙂


Solution

  • Assumptions:

    • all inputs (1st field) consist of exactly 14 digits in YYYYMMDDHHMMSS format

    Sample input:

    $ cat input_file.csv
    20000125200942,xxxxx,xxxx
    

    One awk idea where we redefine the 1st field ($1) with a pair of substrings separated by a space:

    awk 'BEGIN {FS=OFS=","} {$1=substr($1,1,8) " " substr($1,9)} 1' input_file.csv > output_file.csv
    

    This generates:

    $ cat output_file.csv
    20000125 200942,xxxxx,xxxx