Search code examples
postgresqlcsvduplicatesamazon-rdspgadmin

Import CSV to table in postgreSQL ignoring duplicates - Amazon AWS/RDS


I have a PostgreSQL hosted on AWS (RDS). I've created a couple tables and imported some .csv files to this tables using the "Import/Export" tool on PgAdmin4.

Monthly I´ll need to update the data on my tables, and I'll do that by uploading .csv files.

The issue that I'm facing right now is: I am trying to insert new data on a table from a .csv file, but I need to ignore the duplicate values.

I have found a way to do that here (code below ) but the copy command does not work on PgAdmin. Copy only works if I use the import/export tool.

CREATE TEMP TABLE tmp_table 
ON COMMIT DROP
AS
SELECT * 
FROM indice-id-cnpj
WITH NO DATA;

COPY tmp_table FROM 'C:/Users/Win10/Desktop/Dados/ID-CNPJ.csv';

INSERT INTO indice-id-cnpj
SELECT *
FROM tmp_table
ON CONFLICT DO NOTHING

This is my first experience with PostgreSQL (apart from a subject in uni). I can deal with the issue by using excel and doing a little manual work, but I'm looking for a " long term " solution, on how to keep updating the tables using the .csv files, always ignoring the duplicates.

Thanks in advance.


Solution

  • So, I´ve found a solution.

    As Adrian mentioned, I had to use psql.

    CREATE TEMP TABLE tmp_table AS SELECT * FROM table-name WITH NO DATA;
    \copy tmp_table FROM 'C:/Users/Win10/folder/filename.csv' DELIMITER ',' CSV ENCODING 'UTF8' ;
    
    
    INSERT INTO "table-name" SELECT * FROM tmp_table ON CONFLICT DO NOTHING;
    DROP TABLE tmp_table;
    

    Since I´m using psql it´s necessary to use the command \copy instead of COPY. Also, every command must finish with a ";" and it´s necessary to drop the tmp_table at the end.