Search code examples
sqlpostgresqldatetimeclickhouse

How to reformat datetime value in PostgreSQL COPY dump to TSV for ClickHouse import?


(I am trying to copy a 7m row table from PostgreSQL to ClickHouse)

I'm running Postgres in Docker and copying to a tab separated variable dump file with:

docker exec -it db psql -U db_user db_name  -c "COPY table_name ( datetime, value1, value2) TO STDOUT" > outfile.tsv

But the datetime column output is '2021-12-21 09:01:44+00'

Unfortunately my ClickHouse import ...

cat outfile.tsv | curl 'http://localhost:8123/?query=INSERT%20INTO%20my_ch_db.my_table%20FORMAT%20TSV' --data-binary @-

... needs the datetime column in format 2021-12-21 09:01:44

I can get the desired format from Postgres using a SELECT with TO_CHAR

SELECT TO_CHAR(datetime,'YYYY-MM-DD HH24:MI::SS') FROM my_table 

But can't find a way to combine the COPY and the TO_CHAR to get the desired format for my ClickHouse TSV import.

I get a syntax error if I try this:

docker exec -it db psql -U db_user db_name  -c "COPY table_name ( TO_CHAR(datetime,'YYYY-MM-DD HH24:MI::SS'), value1, value2) TO STDOUT" > outfile.tsv

ERROR:  syntax error at or near "("
LINE 1: COPY my_table (TO_CHAR(datetime, 'YYYY-MM-DD H...

ANSWER: From uysizfoz add the setting to the URL like this:

cat outfile.tsv | curl 'http://localhost:8123/?date_time_input_format=best_effort&query=INSERT%20INTO%20my_ch_db.my_table%20FORMAT%20TSV' --data-binary @-

More ClickHouse settings info is here.

In the end I needed to change the BOOL col too, so wrote a little Python script as follows (should've done this in the first place really)

import sys

infile = open(sys.argv[1],'r')
outfile = open(sys.argv[2],'a')

line_arr = []
while True:
    next_line = infile.readline()
    if not next_line:
        break;
    line_arr = next_line.split('\t')
    if line_arr[7] == 't':
        line_arr[7] = '1'
    else:
        line_arr[7] = '0'
    converted_list = [str(element) for element in line_arr]
    joined_string = "\t".join(converted_list)
    outfile.write(joined_string)
infile.close()
outfile.close()

ANSWER 2: Use the CSV import method described here.


Solution

  • Add a setting date_time_input_format=best_effort