Formatting a date column in a CSV file in 'YYYYMMDDHH24MISS' to 'YYYYMMDD HH24MISS' format in ksh

I wrote a script which takes values from a CSV file and inserts in into a postgres table, but was facing a problem. The date columns in the CSV file are in 'YYYYMMDDHH24MISS' format, and the same column in the table are defined as Timestamp, and when inserting using the \copy command, I am getting the below error,

ERROR: 22008: date/time field value out of range: "20000125200942"

So I tried few things and figured out if I have the entries in 'YYYYMMDD HH24MISS' format, the insert works fine.

Now the problem is I don't know how to format the entries of the CSV file in ksh.

I tried using the gsub command with awk, but it didn't seem to work.

awk 'BEGIN {FS=OFS=","} {gsub(/^([0-9]{4})([0-9]{2})([0-9]{2})([0-9]{2})([0-9]{2})([0-9]{2})$/, "\\1\\2\\3 \\4\\5\\6", $1)} 1' input_file.csv > output_file.csv

All your help would be appreciated 🙂

Solution

Assumptions:

all inputs (1st field) consist of exactly 14 digits in YYYYMMDDHHMMSS format

Sample input:

$ cat input_file.csv
20000125200942,xxxxx,xxxx

One awk idea where we redefine the 1st field ($1) with a pair of substrings separated by a space:

awk 'BEGIN {FS=OFS=","} {$1=substr($1,1,8) " " substr($1,9)} 1' input_file.csv > output_file.csv

This generates:

$ cat output_file.csv
20000125 200942,xxxxx,xxxx