Search code examples
sqlpostgresqlconcurrencysql-updatedeadlock

Guarantee order in UPDATE FROM to avoid deadlocks


A query executes concurrently to atomically increment some counters with an aggregate numeric value:

WITH data (
    id,
    delta
) AS (
    VALUES(1,1.2),(2,2.0),(2,0.5)
), agg_data AS (
SELECT
    id::bigint,
    SUM(delta::numeric) AS sum_delta
FROM
    data
GROUP BY
    id
ORDER BY
    id
)
UPDATE
    counters
SET
    val = counters.val + agg_data.sum_delta
FROM
    agg_data
WHERE
    counters.id = agg_data.id;

The order of the id column in the CTE agg_data is set, but the UPDATE statement does not follow it and is updating rows in counters in arbitrary order, which causes deadlocks. Unfortunately, ORDER BY is not supported in UPDATE (wondering if there is a reason for that).

While I could add a sub query which would SELECT FOR UPDATE, it feels like I'm over complicating things:

WITH data (
    id,
    delta
) AS (
    VALUES(1,1.2),(2,2.0),(2,0.5)
), agg_data AS (
SELECT
    id::bigint,
    SUM(delta::numeric) AS sum_delta
FROM
    data
GROUP BY
    id
ORDER BY
    id
), locked_counters AS ( -- add this part to lock in sequential order
   SELECT 
   FROM
      counters
   WHERE
      id
   IN (SELECT id FROM agg_data)
   ORDER BY 
      id
   FOR NO KEY UPDATE)
UPDATE
    counters
SET
    val = counters.val + agg_data.sum_delta
FROM
    agg_data
WHERE
    counters.id = agg_data.id;

Considering that my counters table could be very big and this query needs to be fully optimized for efficiency and integrity - is there a better way to go about this?


Solution

  • Sadly, there is no ORDER BY for UPDATE. We have to take explicit row-level locks with one of the SELECT FOR UPDATE family of tools. And FOR NO KEY UPDATE after ORDER BY in a separate CTE seems like the best choice. You are on the right track.

    Since ...

    this query needs to be fully optimized for efficiency and integrity

    I have a couple of suggestions:

    WITH data (id, delta) AS (
       VALUES (null::bigint, null::numeric)  -- ①
       UNION ALL
       VALUES (1,1.2), (2,2.0), (2,0.5)
       )
    , agg_data AS ( -- ②
       SELECT d.*
       FROM  (
          SELECT id, sum(delta) AS sum_delta
          FROM   data
          GROUP  BY id
          ORDER  BY id   -- optional
          ) d
       JOIN   counters USING (id)  -- ③
       ORDER  BY id
       FOR    NO KEY UPDATE OF counters  -- ② 
       )
    UPDATE counters c
    SET    val = c.val + a.sum_delta
    FROM   agg_data a
    WHERE  c.id = a.id;
    

    ① Don't let input values default to some data type, only to cast them later. That's wasted effort (with the potential of introducing rounding errors). Instead, cast input to the right type immediately. You could do that by casting values of the first (or any) row in the VALUES expression:

    VALUES (bigint '1', numeric '1.2'), (2,2.0), (2,0.5)
    

    That may be inconvenient as it forces you to edit your input. The next best thing is something like I did above. Then you have an immutable, leading row of null values of the right data types forcing the rest of the set to fall in line. The dummy row is eliminated automatically in the subsequent join. For details, see:

    ② You had a CTE agg_data to materialize the result of the aggregate, and another CTE locked_counters to take row locks in order. You can do both in a single CTE. However, quoting the manual:

    Currently, FOR NO KEY UPDATE, FOR UPDATE, FOR SHARE and FOR KEY SHARE cannot be specified with GROUP BY.

    Postgres would raise an exception before it realizes that locks are not applicable for that derived table anyway. A shortcoming, but we can work around it with FOR NO KEY UPDATE OF counters.

    JOIN is typically faster than IN (SELECT ...). IN tries to fold duplicate from the subquery. After GROUP BY id, there cannot be any dupes, but Postgres would not know and still follow a more expensive code path. Also, the JOIN immediately removes input rows without match in the target, which can be a notable gain if there are many.

    In theory this should be a bit faster. You'll have to test.