Search code examples
postgresqlgotransactionsgo-gormtransaction-isolation

Maintain integrity on concurrent updates of the same row


In the following code snippet, I try to find, delete, and create the same item, but in 2 different transactions in 2 different threads.

In Thread 1, I create transaction 1, find the item, and delete it.

Once that is done, I allow thread 2 to create transaction 2, and try to Find the item. The Find() method blocks here, since I use the option FOR UPDATE.

Back in thread 1, the item gets re-created and the transaction 1 committed, which allows the Find() in thread 2 to complete. Here are the problems that arise there:

If I use isolation level "ReadCommitted", I get a not found error - this makes no sense to me, because I thought that a ReadCommitted transaction can see updates applied by others.

If I use isolation level "Serializable", I get the error: pq: could not serialize access due to concurrent update.

Why am I seeing this behaviour? I would think that after the second find unblocks, it should provide me with the latest row.

How can I make it so that when a row is in the process of being modified, any other reads will lock, and unlock returning the most up-to-date data upon completion in the other threads?

db, err := gorm.Open("postgres", "host=localHost port=5432 user=postgres dbname=test-rm password=postgres sslmode=disable")
if err != nil { panic("failed to connect database") }
db.SingularTable(true)
db.DropTableIfExists(&Product{})
db.AutoMigrate(&Product{})

db.Create(&Product{Code: "A", Price: 1000})
// SQL: INSERT  INTO "product" ("code","price") VALUES ('A',1000) RETURNING "products"."id"

txOpt := &sql.TxOptions{Isolation: sql.LevelSerializable}

doneTrans1 := make(chan struct{})

go func(){
    item1 := &Product{}

    tx1 := db.BeginTx(context.Background(), txOpt)

    err = tx1.Set("gorm:query_option", "FOR UPDATE").Find(item1, "code = ?", "A").Error
    // SQL: SELECT * FROM "product"  WHERE (code = 'A') FOR UPDATE

    item1.Price = 3000

    err = tx1.Delete(item1).Error
    // SQL: DELETE FROM "product"  WHERE "product"."id" = 1

    doneTrans1<-struct{}{}
    time.Sleep(time.Second * 3)

    err = tx1.Create(item1).Error
    // SQL: INSERT  INTO "product" ("id","code","price") VALUES (1,'A',3000) RETURNING "product"."id"

    tx1.Commit()
}()

// Ensure other trans work started
<-doneTrans1
time.Sleep(time.Second * 2)

item2 := &Product{}

tx2 := db.BeginTx(context.Background(), txOpt)

err = tx2.Set("gorm:query_option", "FOR UPDATE").Find(item2, "code = ?", "A").Error
// SQL: SELECT * FROM "product"  WHERE (code = 'A') FOR UPDATE
// ERROR occurs here

item2.Price = 5000
err = tx2.Delete(item2).Error
err = tx2.Create(item2).Error
tx2.Commit()
time.Sleep(time.Second * 5)

Solution

  • To answer this I think its best to remove the complexity of the goroutine (and, in fact, go at all) and focus on the SQL. Following are the SQL statements in the order they will be run (I have ignored everything after the error occurs as that is mostly irrelevant and the order of execution gets complex/variable!).

    In the main routine

    INSERT  INTO "product" ("code","price") VALUES ('A',1000) RETURNING "products"."id"
    

    In GoRoutine

    BEGIN TX1
    SELECT * FROM "product"  WHERE (code = 'A') FOR UPDATE
    DELETE FROM "product"  WHERE "product"."id" = 1
    

    In the main routine

    BEGIN TX2
    SELECT * FROM "product"  WHERE (code = 'A') FOR UPDATE -- ERROR occurs here
    

    on to your questions.

    Question 1

    If I use isolation level "ReadCommitted", I get a not found error - this makes no sense to me, because I thought that a ReadCommitted transaction can see updates applied by others.

    From the docs for Read Committed Isolation Level:

    UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the command start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the second updater can proceed with updating the originally found row. If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row.

    So the SELECT * FROM "product" WHERE (code = 'A') FOR UPDATE in TX2 will wait for TX1 to complete. At that point TX1 has deleted product A so the row is ignored and no results are returned. Now I understand that TX1 also recreates product A but remember that "a SELECT query (without a FOR UPDATE/SHARE clause) sees only data committed before the query began;" and as the select begun before TX1 recreated the record it will not be seen.

    Question 2

    If I use isolation level "Serializable", I get the error: pq: could not serialize access due to concurrent update.

    From the docs for Repeatable Read Isolation Level (Serializable is a higher level so these rules, plus some stricter ones, apply):

    UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the transaction start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the repeatable read transaction will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the repeatable read transaction can proceed with updating the originally found row. But if the first updater commits (and actually updated or deleted the row, not just locked it) then the repeatable read transaction will be rolled back with the message

    In your code TX1 updates product A meaning that the query in TX2 will be delayed until TX1 commits at which time it will abort with the error (if TX1 rolled back then it would continue).

    How can I make the second update happen?*

    Maintaining transactional integrity is a hard problem and the functionality in PostgreSQL is the result of a lot of work by some very smart people. If you find yourself fighting the database its often a good idea to take a step back and consider whether you need to change your approach (or if the poblem you perceive is a real issue).

    In your example you have two routines that delete and recreate the same record; I cannot forsee a situation when you would want both transactions to proceed. In a real system where this was possible you would not have carefully arranged timers to ensure one transaction starts first. This would mean that the state of the database after the transactions complete would depend upon which got to the SELECT * FROM "product" WHERE (code = 'A') FOR UPDATE first. So in reality it does not matter if one fails (because the result is pretty much random in any event); its actually a better result because you can advise the user (who can check the record and rerun the task if needed).

    So prior to reading the rest of this I would suggest that you consider if this is a problem at all (I have no background on what you are trying to accomplish so its difficult to comment).

    If you really want to ensure the update proceeds you have a few options:

    • If using "Serializable" you need to detect the failure and retry the transaction (if thats what the business logic demands)
    • If using "Read committed" then replace the DELETE/INSERT with an UPDATE (in that case PostgreSQL will re-evaluate the WHERE clause when the first transactions lock is released).

    However I feel that a better approach is to do away with much of this and attempt to perform updates like this in a single step (which may mean bypassing the ORM). If you want to minimise the liklihood of issues like this then minimising the number/duration of locks is important and performing the operation in a single step helps considerably. For complicated operations using a stored procedure speeds things up but there is still a (reduced) chance of a conflict with other concurrently running operations.

    You may also want to take a look at Optimistic Locking because in some cases this makes more sense (e.g. where you read info, display it to the user and wait for changes but in the meantime another user could have made changes).