I'm working in an environment where I expect several containers to be running queries to "claim" a database row. The multiple containers are for redundancy, but I only ever want one container to succeed. There's an obvious contention / race condition here, and I am currently thinking about solving this via transactions or row locking.
I am using Postgres 15. I am currently trying this query:
WITH find_unprocessed_request AS (
SELECT
id
FROM
request
WHERE
handler_server IS NULL
AND status = :old_status
ORDER BY
created_at ASC
LIMIT 1
FOR UPDATE
)
UPDATE request
SET
status = :new_status,
handler_server = :handler_server
FROM
find_unprocessed_request
WHERE
request.id = find_unprocessed_request.id
You can see that a row is found if it has an old status and there's no handler server, and I want a server only to write its host-name in such a record if it is NULL.
I based my query on the question relating to this answer, though I added the FOR UPDATE
per the answer (my non-expert reading of that is that I do need this clause).
However, having read this answer I wonder if I need a multi-statement transaction to be certain to avoid race conditions. I will take that approach if I have to, but I mildly would like to avoid it, since my driver (PDO/PHP) seems to be saying that multi-statement executions cannot also use parameter binding.
I think this answer is also agreeing with the basic correctness of adding FOR UPDATE
, though the solution here is UPDATE WHERE
using a sub-select instead of a CTE. I'd be happy to do that instead if it is regarded as safe, since it's just a single query, and should work fine with my driver.
Your statement looks basically fine. If each transaction tries to process the next free ID (of many), then I would throw in SKIP LOCKED
so that none waits on a locked row (just to give up when it comes back with handler_server IS NOT NULL
after the concurrent transaction has updated it). Plus a couple other suggestions:
WITH find_unprocessed_request AS (
SELECT id
FROM request
WHERE handler_server IS NULL
AND status = :old_status
ORDER BY id -- ①
LIMIT 1
FOR NO KEY UPDATE SKIP LOCKED -- ②
)
UPDATE request r
SET status = :new_status
, handler_server = :handler_server
FROM find_unprocessed_request f
WHERE r.id = f.id;
① Use columns that satisfy your ordering requirements and match an existing B-tree index. You commented that processing rows in the order of id
values (a serial column) is good enough. Assuming relatively few with handler_server IS NULL
, a partial index would make sense - which becomes cheaper if we don't have to include created_at
:
CREATE INDEX ON request (status, id) WHERE handler_server IS NULL;
② Assuming that none of the updated columns is part of unique index, we can use a weaker (and cheaper) FOR NO KEY UPDATE
lock, because the following UPDATE
is also satisfied with it. Thanks to Rabban for hinting at that.
See details in the manual.
Now that we have added SKIP LOCKED
, it's no longer important to make ORDER BY
unambiguous - like I had suggested at first. That would matter otherwise to avoid deadlocks, and to process rows in a strictly deterministic order - often a requirement, but not in your case.
Like you found yourself, for the simple case of a single locked ID, a subquery is slightly simpler and cheaper, yet:
UPDATE request
SET status = :new_status
, handler_server = :handler_server
WHERE id = (
SELECT id
FROM request
WHERE handler_server IS NULL
AND status = :old_status
ORDER BY id
LIMIT 1
FOR NO KEY UPDATE SKIP LOCKED
);
Either way, with SKIP LOCKED
you can never be certain that all rows have been processed. Irrelevant for a process that is perpetuated ad infinitum anyway. Else, read up on details in my reference answer: