Search code examples
sqlpostgresqlmerge

Clean approach to achieving Merge functionality in Postgres, for when not matched by target/source


I attempted to use a MERGE statement in Postgres 16.2, and you can find the example below:

MERGE INTO schema1.target_table AS t
USING schema2.source_table AS s
ON t.id = s.id
WHEN MATCHED THEN
    UPDATE SET
        data1 = s.data1,
        data2 = s.data2
WHEN NOT MATCHED BY TARGET
    INSERT (id, data1, data2,)
    VALUES (s.id, s.data1, s.data2)
WHEN NOT MATCHED BY SOURCE
    DELETE;

While trying this query, I discovered that Postgres has supported the MERGE statement since version 15, but it does not support the "BY TARGET" or "BY SOURCE" statement, throwing this error:

SyntaxError: syntax error at or near "BY"
LINE 8: WHEN NOT MATCHED BY TARGET

In my search for cleaner alternatives to achieve the same result, I came across this helpful but old answer. While it worked perfectly, I find it a bit complicated to understand and would prefer to avoid it to keep my code as clean as possible.

Do you know of any way to achieve the same result in a cleaner manner?


Solution

  • You didn't say what's not clean about the old query, but here's an alternative: demo

    with cte as (
      delete from schema1.target_table 
      where id not in (select id from schema2.source_table) )
    insert into schema1.target_table
    table schema2.source_table
    on conflict(id) do update set data1=excluded.data1,
                                  data2=excluded.data2;
    

    The principle remains exactly the same: that's an old-school PostgreSQL upsert, with a CTE to fill in for the non-standard WHEN NOT MATCHED BY SOURCE extension found in Ms SQL Server.

    In the demo you can see there's effectively very little difference between them - if you tweak the sample size and reorder tests, it seems like any advantage of this here over the old one is down to noise. You can tweak both to instead do anti-joins, except, not exists (also leads to a hashed right anti-join), etc, depending on how your table's actually set up and indexed.

    Also, the two operations (delete+upsert) wrapped in a transaction are no different than gluing them together into a single statement, but the 2in1 is certainly cleaner and a bit more convenient to use. Safer if you account for being less prone to bugs related to concurrency.

    Demo also shows @MatBailie's immediate suggestion to truncate in a transaction is reliably faster than either of these. Be warned that while it's transaction-safe, it's not MVCC-safe.

    If you decide not to wipe&repopulate the table, you can consider adjusting your fillfactor storage parameter: by default it's 100, so your updates and inserts go to new pages. Making it 50 makes them half-empty and able to accept new (versions of) rows, in-place. I think the demo got 25% shorter exec times after adding that in.