My use case is the following: I have data coming from a csv file and I need to load it into a table (so far so good, nothing new here). It might happen that same data is sent with updated columns, in which case I would like to try to insert and replace in case of duplicate.
So my table is as follows:
CREATE TABLE codes (
code TEXT NOT NULL,
position_x INT,
position_y INT
PRIMARY KEY (code)
);
And incoming csv file is like this:
TEST01,1,1
TEST02,1,2
TEST0131,3
TEST04,1,4
It might happen that sometime in the future I get another csv file with:
TEST01,1,1000 <<<<< updated value
TEST05,1,5
TEST0631,6
TEST07,1,7
Right now what is happening is when I run for the first file, everything is fine, but when I execute for the second one I'm getting an error:
2017-04-26T10:33:51.306000+01:00 ERROR Database error 23505: duplicate key value violates unique constraint "codes_pkey"
DETAIL: Key (code)=(TEST01) already exists.
I load data using:
pgloader csv.load
And my csv.load file looks like this:
LOAD CSV
FROM 'codes.csv' (code, position_x, position_y)
INTO postgresql://localhost:5432/codes?tablename=codes (code, position_x, position_y)
WITH fields optionally enclosed by '"',
fields terminated by ',';
Is what I'm trying to do possible with pgloader?
I also tried dropping constrains for the primary key but then I end up with duplicate entries in the table.
Thanks a lot for your help.
No, you can't. As per reference
To work around that (load exceptions, eg PK violations), pgloader cuts the data into batches of 25000 rows each, so that when a problem occurs it's only impacting that many rows of data.
in brackets - mine...
The best you can do is load csv to table with same structure and then merge data with help of query (EXCEPT
, OUTER JOIN ... where null
and so on)