Search code examples
sqlperformancepostgresqlcsvpostgresql-copy

Check if records exists in a Postgres table


I have to read a CSV every 20 seconds. Each CSV contains min. of 500 to max. 60000 lines. I have to insert the data in a Postgres table, but before that I need to check if the items have already been inserted, because there is a high probability of getting duplicate item. The field to check for uniqueness is also indexed.

So, I read the file in chunks and use the IN clause to get the items already in the database.

Is there a better way of doing it?


Solution

  • This should perform well:

    CREATE TEMP TABLE tmp AS SELECT * FROM tbl LIMIT 0 -- copy layout, but no data
    
    COPY tmp FROM '/absolute/path/to/file' FORMAT csv;
    
    INSERT INTO tbl
    SELECT tmp.*
    FROM   tmp
    LEFT   JOIN tbl USING (tbl_id)
    WHERE  tbl.tbl_id IS NULL;
    
    DROP TABLE tmp; -- else dropped at end of session automatically
    

    Closely related to this answer.