Search code examples
sqlnode.jspostgresqlknex.js

How to Ignore error in batch insert Postgresql


I have a process that runs every 5 minutes and tries to insert a batch of articles into a table. The articles come from web-scraping, so there are cases in which I am trying to insert a batch that contains articles which have already been saved into the DB.

My primary key is uuid - an MD5 hash of the article title.

Checking if an article exists in the db to filter the batch is kinda inefficient.

Is it a DB level way in Postgresql to ignore the attempts of inserting a duplicate uuid without returning an Error?


Solution

  • Solution

    You could insert using the WHERE NOT EXISTS clause.

    For example, consider a test table with a numeric id as primary key and a textual name.

    Code

    db=> CREATE TABLE test(id BIGSERIAL PRIMARY KEY, name TEXT);
    CREATE TABLE
    
    -- Insertion will work - empty table
    db=> INSERT INTO test(id, name) 
         SELECT 1, 'Partner number 1' 
         WHERE NOT EXISTS (SELECT 1,2 FROM test WHERE id=1);
    INSERT 0 1
    
    -- Insertion will NOT work - duplicate id
    db=> INSERT INTO test(id, name) 
         SELECT 1, 'Partner number 1' 
         WHERE NOT EXISTS (SELECT 1,2 FROM test WHERE id=1);    
    INSERT 0 0
    
    -- After two insertions, the table contains only one row
    db=> SELECT * FROM test;
     id |       name
    ----+------------------
      1 | Partner number 1
    (1 row)
    

    Difference from ON CONFLICT

    Quoting the documentation:

    ON CONFLICT can be used to specify an alternative action to raising a unique constraint or exclusion constraint violation error.

    The action can be DO NOTHING, or a DO UPDATE. The second approach is often referred to as Upsert - a portmanteau of Insert and Update.

    Technically WHERE NOT EXISTS is equivalent to ON CONFLICT DO NOTHING. See the query plans for a deeper dive.