I have two tables in my DB: my source of truth table (call it table A) and a denormalized table (call it table B).
A row in table A, will have a column status
that is a BOOLEAN
, and table B only contains a row if the status
in table A is true
.
Since table B is a new addition to my DB, I need to populate it from the already-existing data in table A. I use a PLPGSQL function to do that:
CREATE OR REPLACE FUNCTION backfill_single_batch() AS
$$
...
CREATE TEMP TABLE temp ON COMMIT DROP AS
SELECT ...
FROM tableA
WHERE status IS TRUE
FOR UPDATE;
CREATE TEMP TABLE inserted_rows ON COMMIT DROP AS
WITH ins AS (
INSERT INTO tableB
SELECT * FROM temp
ON CONFLICT (..) DO NOTHING
RETURNING *)
SELECT *
FROM ins;
...
Note the FOR UPDATE
in the first query. I believe this is needed so that if a row changes from TRUE
to FALSE
after the first CREATE TEMP
, I do not want to insert it.
Meanwhile, of course, I have a trigger that deletes rows from tableB whenever a status is updated from TRUE
to FALSE
. Here it is:
CREATE OR REPLACE FUNCTION on_status_update()
RETURNS TRIGGER AS $$
BEGIN
IF OLD.status IS TRUE AND NEW.status IS FALSE THEN
DELETE FROM tableB WHERE ...;
ELSIF OLD.status IS FALSE AND NEW.status IS TRUE THEN
INSERT INTO table B (...)
SELECT ...
FROM tableA
WHERE ...
ON CONFLICT DO NOTHING;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trigger_on_status_update
AFTER UPDATE ON tableA
FOR EACH ROW
EXECUTE FUNCTION on_status_update();
The issue now:
I am seeing rows with status FALSE
being inserted by my PLPGSQL function after they were removed by the trigger.
So the sequence of this is:
FALSE
What am I doing wrong here? I would have expected that the FOR UPDATE
in my function would be enough to guard against this scenario, as these rows would be locked until the function executes but evidently not.
As long as you define the trigger before you start your script for the already existing data, the inconsistency you describe cannot happen.
One shortcoming in your setup is that you are missing triggers for INSERT
and DELETE
.
Your bulk update procedure could be greatly simplified:
WITH truerows AS (
SELECT * FROM tablea
WHERE bool
FOR SHARE
)
INSERT INTO tableb
SELECT * FROM truerows;
A SHARE
lock is sufficient to prevent concurrent data modifications.