Search code examples
sqlpostgresqlconcurrencytriggersplpgsql

Locking rows behaviour for a temporary table in a PLPGSQL function


I have two tables in my DB: my source of truth table (call it table A) and a denormalized table (call it table B).

A row in table A, will have a column status that is a BOOLEAN, and table B only contains a row if the status in table A is true.

Since table B is a new addition to my DB, I need to populate it from the already-existing data in table A. I use a PLPGSQL function to do that:

CREATE OR REPLACE FUNCTION backfill_single_batch() AS
$$
...
CREATE TEMP TABLE temp ON COMMIT DROP AS
    SELECT ...
    FROM tableA
    WHERE status IS TRUE
    FOR UPDATE;

CREATE TEMP TABLE inserted_rows ON COMMIT DROP AS
    WITH ins AS (
        INSERT INTO tableB
            SELECT * FROM temp
            ON CONFLICT (..) DO NOTHING
            RETURNING *)
    SELECT *
    FROM ins;
...

Note the FOR UPDATE in the first query. I believe this is needed so that if a row changes from TRUE to FALSE after the first CREATE TEMP, I do not want to insert it.

Meanwhile, of course, I have a trigger that deletes rows from tableB whenever a status is updated from TRUE to FALSE. Here it is:

CREATE OR REPLACE FUNCTION on_status_update()
    RETURNS TRIGGER AS $$
BEGIN
    IF OLD.status IS TRUE AND NEW.status IS FALSE THEN
        DELETE FROM tableB WHERE ...;
    ELSIF OLD.status IS FALSE AND NEW.status IS TRUE THEN
        INSERT INTO table B (...)
        SELECT ...
        FROM tableA
        WHERE ...
        ON CONFLICT DO NOTHING;
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trigger_on_status_update
    AFTER UPDATE ON tableA
    FOR EACH ROW
EXECUTE FUNCTION on_status_update();

The issue now:

I am seeing rows with status FALSE being inserted by my PLPGSQL function after they were removed by the trigger.

So the sequence of this is:

  1. A row is updated to FALSE
  2. Trigger kicks in and deletes the row from table B
  3. My PLPGSQL function, running at the same time, still thinks that that particular row should be inserted in table B, and ends up inserting the spurious row, resulting in incorrect state.

What am I doing wrong here? I would have expected that the FOR UPDATE in my function would be enough to guard against this scenario, as these rows would be locked until the function executes but evidently not.


Solution

  • As long as you define the trigger before you start your script for the already existing data, the inconsistency you describe cannot happen.

    One shortcoming in your setup is that you are missing triggers for INSERT and DELETE.

    Your bulk update procedure could be greatly simplified:

    WITH truerows AS (
       SELECT * FROM tablea
       WHERE bool
       FOR SHARE
    )
    INSERT INTO tableb
    SELECT * FROM truerows;
    

    A SHARE lock is sufficient to prevent concurrent data modifications.