Search code examples
sqlpostgresqlsql-updateinner-join

Database UPDATE SET when joining tables and the value to update is not unique due to cartesian product


How does PostgreSQL (perhaps it is contemplated by standard SQL) behave when the SET UPDATE is not a unique record but a cartesian product?

Imagine for some reason the b table contains: (1,1),(1,1),(1,2) What should be the value to update (or does the database make a cartesian product or creates records or something)?

UPDATE table_a a
SET b_value = b.value
FROM (SELECT id, value FROM mdm.table_b) AS b
WHERE b.a_id = a.id;

Solution

  • Your query is not stable. Bad things might happen.

    The documentation is clear about that:

    When using FROM you should ensure that the join produces at most one output row for each row to be modified. In other words, a target row shouldn't join to more than one row from the other table(s). If it does, then only one of the join rows will be used to update the target row, but which one will be used is not readily predictable.

    Then:

    Because of this indeterminacy, referencing other tables only within sub-selects is safer, though often harder to read and slower than using a join.

    If you were to follow the documentation's advice, you could phrase the query as:

    update table_a a
    set b_value = (select max(b.value) from table_b b where b.a_id = a.id)
    where exists (select 1 from table_b b where b.a_id = a.id)
    

    The aggregate function in the subquery ensures that a single row will be returned (you could as well use min()). You can also express this with from:

    update table_a a
    set b_value = b.value
    from (select a_id, max(value) as value from table_b group by a_id) as b
    where b.a_id = a.id;