I have an example situation: parent
table has a column named id
, referenced in child
table as a foreign key.
When deleting a child row, how to delete the parent as well if it's not referenced by any other child?
In PostgreSQL 9.1 or later you can do this with a single statement using a data-modifying CTE. This is generally less error prone. It minimizes the time frame between the two DELETEs in which a race conditions could lead to surprising results with concurrent operations:
WITH del_child AS (
DELETE FROM child
WHERE child_id = 1
RETURNING parent_id, child_id
)
DELETE FROM parent p
USING del_child x
WHERE p.parent_id = x.parent_id
AND NOT EXISTS (
SELECT FROM child c
WHERE c.parent_id = x.parent_id
AND c.child_id <> x.child_id -- !
);
The child is deleted in any case. I quote the manual:
Data-modifying statements in
WITH
are executed exactly once, and always to completion, independently of whether the primary query reads all (or indeed any) of their output. Notice that this is different from the rule forSELECT
inWITH
: as stated in the previous section, execution of aSELECT
is carried only as far as the primary query demands its output.
The parent is only deleted if it has no other children.
Note the last condition. Contrary to what one might expect, this is necessary, since:
The sub-statements in
WITH
are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements inWITH
, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot (see Chapter 13), so they cannot "see" each others' effects on the target tables.
Bold emphasis mine.
I used the column name parent_id
in place of the non-descriptive id
.
To eliminate the possible race conditions completely, lock the parent row first. All similar operations must follow the same procedure to make it work.
WITH lock_parent AS (
SELECT p.parent_id, c.child_id
FROM child c
JOIN parent p ON p.parent_id = c.parent_id
WHERE c.child_id = 12 -- provide child_id here once
FOR NO KEY UPDATE -- locks parent row.
)
, del_child AS (
DELETE FROM child c
USING lock_parent l
WHERE c.child_id = l.child_id
)
DELETE FROM parent p
USING lock_parent l
WHERE p.parent_id = l.parent_id
AND NOT EXISTS (
SELECT FROM child c
WHERE c.parent_id = l.parent_id
AND c.child_id <> l.child_id -- !
);
This way only one transaction at a time can lock the same parent. So it cannot happen that multiple transactions delete children of the same parent, still see other children and spare the parent, while all of the children are gone afterwards. (Updates on non-key columns are still allowed with FOR NO KEY UPDATE
.)
If such cases never occur or you can live with it (hardly ever) happening - the first query is cheaper. Else, this is the secure path.
FOR NO KEY UPDATE
was introduced with Postgres 9.4. Details in the manual. In older versions use the stronger lock FOR UPDATE
instead.