Search code examples
sqlpostgresqlconcurrencylocking

PostgreSQL select for update lock, new rows


I have the following concurrency use-case: An endpoint can be called at any time and an operation is supposed to happen. The operation goes like this in pseudocode (current isolation level is READ COMMITTED):

SELECT * FROM TABLE_A WHERE IS_LATEST=true FOR UPDATE
// DO SOME APP LOGIC TO TEST VALIDITY
// ALL GOES WELL => INSERT OR UPDATE NEW ROW WITH IS_LATEST=TRUE => COMMIT
// OTHERWISE => ROLLBACK (all good not interesting)

Now this approach with SELECT FOR UPDATE is fine if two of these operations start at the same time in the respects of update. Because both transactions see the same number of rows, one will update the rows and the second transaction will wait its turn before being able to SELECT FOR UPDATE and the state is valid.

The issue I have is when I have an insert in the first transaction. What happens is that for example when the first transaction makes that lock SELECT FOR UPDATE there are two rows, then the transaction continues, in the middle of the transaction, the second transaction comes in wanting to SELECT FOR UPDATE (latest) and waits for first transaction to finish.. The first transaction finished and there is a new third item realistically in the db, but the second transaction picks up only two rows while it was waiting for the row locks to be released. (This is because at the time of calling the SELECT FOR UPDATE the snapshot was different had only two rows that matched IS_LATEST=true).

Is there a way to make this transaction such that the SELECT lock picks up the latest snapshot after waiting?


Solution

  • The issue is that each command only sees rows that have been committed before the query started. There are various possible solutions ...

    Stricter isolation level

    You can solve this with a stricter isolation level, but that's relatively expensive.

    Laurenz already provided a solution for this.

    Just start a new command

    Keep the (cheap) default isolation level READ COMMITTED, and just start a new command.

    Only few rows to lock

    While only locking a hand full of rows, the dead simple solution is to repeat the same SELECT ... FOR UPDATE. The second iteration sees newly committed rows and locks them additionally.

    There is a theoretical race condition with additional transactions that might lock new rows before the waiting transaction does. That would result in a deadlock. Highly unlikely, but to be absolutely sure, lock rows in consistent order:

    BEGIN;  -- default READ COMMITTED
    
    SELECT FROM table_a WHERE is_latest ORDER BY id FOR UPDATE;  -- consistent order
    SELECT * FROM table_a WHERE is_latest ORDER BY id FOR UPDATE;  -- just repeat !!
    
    --  DO SOME APP LOGIC TO TEST VALIDITY
    
    -- pseudo-code
    IF all_good
       UPDATE table_a SET is_latest = true WHERE ...;
       INSERT table_a (IS_LATEST, ...) VALUES (true, ...);
       COMMIT;
    ELSE
       ROLLBACK;
    END; 
    

    A partial index on (id) WHERE is_latest would be ideal.

    More rows to lock

    For more than a hand full of rows, I would instead create a dedicated one-row token table. A bullet-proof implementation could look like this, run as admin or superuser:

    CREATE TABLE public.single_task_x (just_me bool CHECK (just_me) PRIMARY KEY DEFAULT true);
    INSERT INTO public.single_task_x VALUES (true);
    REVOKE ALL ON public.single_task_x FROM public;
    GRANT SELECT, UPDATE ON public.single_task_x TO public;  -- or just to those who need it
    

    See:

    Then:

    BEGIN;  -- default READ COMMITTED
    
    SELECT FROM public.single_task_x FOR UPDATE;
    SELECT * FROM table_a WHERE is_latest;  -- FOR UPDATE? ①
    
    --  DO SOME APP LOGIC TO TEST VALIDITY
    
    -- pseudo-code
    IF all_good
       ROLLBACK;
    ELSE
       UPDATE table_a SET is_latest = true WHERE ...;
       INSERT table_a (IS_LATEST, ...) VALUES (true, ...);
       COMMIT;
    END; 
    

    A single lock is cheaper.
    ① You may or may not want to lock additionally, to defend against other writes, possibly with a weaker lock ....

    Either way, all locks are released at the end of the transaction automatically.

    Advisory lock

    Or use an advisory lock. pg_advisory_xact_lock() persists for the duration of the transaction:

    BEGIN;  -- default READ COMMITTED
    
    SELECT pg_advisory_xact_lock(123);
    SELECT * FROM table_a WHERE is_latest;
    
    -- do stuff
    
    COMMIT;  -- or ROLLBACK; 
    

    Make sure to use a unique token for your particular task. 123 in my example. Consider a look-up table if you have many different tasks.

    To release the lock at a different point in time (not when the transaction ends), consider a session-level lock with pg_advisory_lock(). Then you can (and must) unlock manually with pg_advisory_unlock() - or close the session.

    Both of these wait for the locked resource. There are alternative functions returning false instead of waiting ...