Search code examples
pythonmysqldjangodeadlockselect-for-update

Deadlock with Django / MYSQL and filter on select_for_update


I am getting some deadlocks in my project where I have multiple processes altering the same objects in the database.

I have an endpoint whose point is to fetch the most recent unfinished play and then proceed to alter it if one exists, if not then it should create one. I need to make sure that if I receive 2 concurrent requests in this endpoint one of them creates the object and the other one blocks execution until that one is created and then the second one will alter the object created.

In my django application I have the following query for that inside transaction.atomic() context:

# Use list to force evaluation
play = list(Play.objects.select_for_update().filter(
                game=self.game,
                user=self.user,
                discard=False,
                finished=False,
            )
        )

It used to be .last() but I read that the ORDER BY it performs in the database query would sometimes raise problems with deadlocks so I tried this instead.

From my understanding (which is probably flawed) MYSQL should acquire an exclusive record lock on the finished index (I have that as index in my database) which would not allow for an object to be created while this transaction is occurring and any transaction that tries to would be blocked waiting for this one to complete. It should also acquire a record lock on the row itself so that no transaction can change its content

In heavy load testing I never had concurrency issues and it works as intended, however I sometimes find deadlocks in the databases which I dont fully understand with this particular code:

 with transaction.atomic():
        Play.objects.select_for_update().get(pk=play.pk)
        <Changes to the Play>
        play.save()

When doing SHOW ENGINE INNODB STATUS I got the following report which I edited for simplicity's sake:

*** (1) TRANSACTION:
TRANSACTION 901805890, ACTIVE 0 sec fetching rows
mysql tables in use 3, locked 3
LOCK WAIT 116 lock struct(s), heap size 24696, 5 row lock(s)
SELECT `*` FROM `games_play` WHERE (`games_play`.`discard` = 0 AND `games_play`.`finished` = 0 AND `games_play`.`game_id` = 1 AND `games_play`.`user_id` = 28) FOR UPDATE

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 4268 page no 411 n bits 1616 index games_play_finished_71622b41 of table `test_dev`.`games_play` trx id 901805890 lock_mode X locks rec but not gap
Record lock, heap no 18 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 1; hex 80; asc  ;;
 1: len 4; hex 8000425c; asc   B\;;

Record lock, heap no 19 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 1; hex 80; asc  ;;
 1: len 4; hex 8000425d; asc   B];;


*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 4268 page no 969 n bits 80 index PRIMARY of table `test_dev`.`games_play` trx id 901805890 lock_mode X locks rec but not gap waiting
Record lock, heap no 6 PHYSICAL RECORD: n_fields 13; compact format; info bits 0
 0: len 4; hex 8000425d; asc   B];;
 1: len 6; hex 000035c07741; asc   5 wA;;
 2: len 7; hex 02000001a42d67; asc      -g;;
 3: len 1; hex 81; asc  ;;
 4: len 30; hex 0003006e00190004001d000400210008000029000c2d0004010062657473; asc    n         !    )  -    bets; (total 111 bytes);
 5: len 7; hex 63726173685f37; asc crash_7;;
 6: len 30; hex 00020073001200070019000e00002700004f00696e697469616c6f6e5f6c; asc    s          '  O initialon_l; (total 116 bytes);
 7: len 8; hex 99b07498ef076fa0; asc   t   o ;;
 8: len 8; hex 99b07498f40dce58; asc   t    X;;
 9: len 1; hex 81; asc  ;;
 10: len 4; hex 80000001; asc     ;;
 11: len 4; hex 8000000f; asc     ;;
 12: SQL NULL;


*** (2) TRANSACTION:
TRANSACTION 901805889, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
UPDATE `games_play` SET `finished` = 1, `details` = '{}', `game_token` = '1234', `jwt_token` = NULL, `game_id` = 1, `user_id` = 15, `created` = '2023-06-26', `modified` = '2023-06-26', `discard` = 1 WHERE `games_play`.`id` = 16989

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 4268 page no 969 n bits 80 index PRIMARY of table `test_dev`.`games_play` trx id 901805889 lock_mode X locks rec but not gap
Record lock, heap no 6 PHYSICAL RECORD: n_fields 13; compact format; info bits 0
 0: len 4; hex 8000425d; asc   B];;
 1: len 6; hex 000035c07741; asc   5 wA;;
 2: len 7; hex 02000001a42d67; asc      -g;;
 3: len 1; hex 81; asc  ;;
 4: len 30; hex 0003006e00190004001d000400210008000029000c2d0004010062657473; asc    n         !    )  -    bets; (total 111 bytes);
 5: len 7; hex 63726173685f37; asc crash_7;;
 6: len 30; hex 00020073001200070019000e00002700004f00696e697469616c6f6e5f6c; asc    s          '  O initialon_l; (total 116 bytes);
 7: len 8; hex 99b07498ef076fa0; asc   t   o ;;
 8: len 8; hex 99b07498f40dce58; asc   t    X;;
 9: len 1; hex 81; asc  ;;
 10: len 4; hex 80000001; asc     ;;
 11: len 4; hex 8000000f; asc     ;;
 12: SQL NULL;


*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 4268 page no 411 n bits 1616 index games_play_finished_71622b41 of table `test_dev`.`games_play` trx id 901805889 lock_mode X locks rec but not gap waiting
Record lock, heap no 19 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 1; hex 80; asc  ;;
 1: len 4; hex 8000425d; asc   B];;


From what I understood from this report, the first transaction acquired the lock on the finished index to prevent insertion and was trying to acquire the lock on the play it self but that was already being held by the other transaction which also wanted to acquire the lock on the finished index. But these plays belong to different users which in my mind the first query should never acquire a lock to that.

I would like to understand what is going on here and what could be done to prevent this.


Solution

  • It appears you only have a single-column index on finished, e.g.:

    class Play(models.Model):
        finished = models.BooleanField(db_index=True)
        ...
    

    In that case, selecting-for-update an unfinished Play with .get(pk=play.pk) will first lock the PRIMARY index for the pk, and then attempt to lock the finished = 0 index record.

    Remove it and add an index on ["user_id", "finished"]:

    class Play(models.Model):
        finished = models.BooleanField()
        ...
    
        class Meta:
            indexes = [
                models.Index(fields=["user_id", "finished"]),
            ]
    

    In this case, the index lock will be for separate users, e.g. user_id = 15 AND finished = 0.

    Remember to run migrations:

    python manage.py makemigrations
    python manage.py migrate