(I am trying to copy a 7m row table from PostgreSQL to ClickHouse)
I'm running Postgres in Docker and copying to a tab separated variable dump file with:
docker exec -it db psql -U db_user db_name -c "COPY table_name ( datetime, value1, value2) TO STDOUT" > outfile.tsv
But the datetime column output is '2021-12-21 09:01:44+00'
Unfortunately my ClickHouse import ...
cat outfile.tsv | curl 'http://localhost:8123/?query=INSERT%20INTO%20my_ch_db.my_table%20FORMAT%20TSV' --data-binary @-
... needs the datetime column in format 2021-12-21 09:01:44
I can get the desired format from Postgres using a SELECT with TO_CHAR
SELECT TO_CHAR(datetime,'YYYY-MM-DD HH24:MI::SS') FROM my_table
But can't find a way to combine the COPY and the TO_CHAR to get the desired format for my ClickHouse TSV import.
I get a syntax error if I try this:
docker exec -it db psql -U db_user db_name -c "COPY table_name ( TO_CHAR(datetime,'YYYY-MM-DD HH24:MI::SS'), value1, value2) TO STDOUT" > outfile.tsv
ERROR: syntax error at or near "("
LINE 1: COPY my_table (TO_CHAR(datetime, 'YYYY-MM-DD H...
ANSWER: From uysizfoz add the setting to the URL like this:
cat outfile.tsv | curl 'http://localhost:8123/?date_time_input_format=best_effort&query=INSERT%20INTO%20my_ch_db.my_table%20FORMAT%20TSV' --data-binary @-
More ClickHouse settings info is here.
In the end I needed to change the BOOL col too, so wrote a little Python script as follows (should've done this in the first place really)
import sys
infile = open(sys.argv[1],'r')
outfile = open(sys.argv[2],'a')
line_arr = []
while True:
next_line = infile.readline()
if not next_line:
break;
line_arr = next_line.split('\t')
if line_arr[7] == 't':
line_arr[7] = '1'
else:
line_arr[7] = '0'
converted_list = [str(element) for element in line_arr]
joined_string = "\t".join(converted_list)
outfile.write(joined_string)
infile.close()
outfile.close()
ANSWER 2: Use the CSV import method described here.
Add a setting date_time_input_format=best_effort