Search code examples
postgresqlsql-updatebulkupdate

Most efficient way to do a bulk UPDATE with pairs of input


Suppose I want to do a bulk update, setting a=b for a collection of a values. This can easily be done with a sequence of UPDATE queries:

UPDATE foo SET value='foo' WHERE id=1
UPDATE foo SET value='bar' WHERE id=2
UPDATE foo SET value='baz' WHERE id=3

But now I suppose I want to do this in bulk. I have a two dimensional array containing the ids and new values:

[ [ 1, 'foo' ]
  [ 2, 'bar' ]
  [ 3, 'baz' ] ]

Is there an efficient way to do these three UPDATEs in a single SQL query?

Some solutions I have considered:

  1. A temporary table

    CREATE TABLE temp ...;
    INSERT INTO temp (id,value) VALUES (....);
    UPDATE foo USING temp ...
    

    But this really just moves the problem. Although it may be easier (or at least less ugly) to do a bulk INSERT, there are still a minimum of three queries.

  2. Denormalize the input by passing the data pairs as SQL arrays. This makes the query incredibly ugly, though

    UPDATE foo
    USING (
        SELECT
            split_part(x,',',1)::INT AS id,
            split_part(x,',',2)::VARCHAR AS value
        FROM (
            SELECT UNNEST(ARRAY['1,foo','2,bar','3,baz']) AS x
        ) AS x;
    )
    SET value=x.value WHERE id=x.id
    

    This makes it possible to use a single query, but makes that query ugly, and inefficient (especially for mixed and/or complex data types).

Is there a better solution? Or should I resort to multiple UPDATE queries?


Solution

  • Normally you want to batch-update from a table with sufficient index to make the merge easy:

    CREATE TEMP TABLE updates_table
            ( id integer not null primary key
            , val varchar
            );
    INSERT into updates_table(id, val) VALUES
     ( 1, 'foo' ) ,( 2, 'bar' ) ,( 3, 'baz' )
            ;
    
    UPDATE target_table t
    SET value = u.val
    FROM updates_table u
    WHERE t.id = u.id
            ;
    

    So you should probably populate your update_table by something like:


    INSERT into updates_table(id, val)
    SELECT
            split_part(x,',',1)::INT AS id,
            split_part(x,',',2)::VARCHAR AS value
        FROM (
            SELECT UNNEST(ARRAY['1,foo','2,bar','3,baz']) 
             ) AS x
         ;
    

    Remember: an index (or the primary key) on the id field in the updates_table is important. (but for small sets like this one, a hashjoin will probably by chosen by the optimiser)


    In addition: for updates, it is important to avoid updates with the same value, these cause extra rowversions to be created + plus the resulting VACUUM activity after the update was committed:

    UPDATE target_table t
        SET value = u.val
        FROM updates_table u
        WHERE t.id = u.id
        AND (t.value IS NULL OR t.value <> u.value)
                ;