Search code examples
mysqldatabasemariadbinnodbpercona

Understanding InnoDB deadlock log


I have a transaction which is performing multiple INSERT queries. At the same time, there might occur a job that is performing consistency check on the database run by another user, which is doing replication checksum calculations by issuing SELECT queries over the database.

The problem is that sometimes my app-level transaction deadlocks with the transaction from the tool which is performing these checks:

------------------------
LATEST DETECTED DEADLOCK
------------------------
2022-05-19 18:25:20 0x7f6eb63e3700
*** (1) TRANSACTION:
TRANSACTION 421588457956552, ACTIVE 0 sec fetching rows
mysql tables in use 2, locked 2
LOCK WAIT 1012 lock struct(s), heap size 123088, 85725 row lock(s)
MySQL thread id 15333390, OS thread handle 140086616594176, query id 2884722462 10.96.7.108 replication_checksum_user Sending data
REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'bar', 'bars', '17', 'foo+idx', '688438b1-63b7-4cdd-ba14-5ac2811fce08,688438b1-63b7-4cdd-ba14-5ac2811fce08,6149061644177471', '6f2aecfe-44b8-4b04-8913-5086c5402c02,6f2aecfe-44b8-4b04-8913-5086c5402c02,4356115808993199', COUNT(*) AS cnt, COALESCE(LOWER(CONCAT(LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(CRC32(CONCAT_WS('#', convert(`field1` using utf8mb4), convert(`field2` using utf8mb4), convert(`field3` using utf8mb4), convert(`field4` using utf8mb4), `field5`, `field6`, `created_time`, `finished_time`, CONCAT(ISNULL(`notification_id`), ISNULL(`attempt_type`), ISNULL(`successful`), ISNULL(`finished_time`)))), 1, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0')))
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 109 page no 16087 n bits 192 index foo_idx of table `bar`.`bars` trx id 421588457956552 lock mode S waiting
Record lock, heap no 120 PHYSICAL RECORD: n_fields 10; compact format; info bits 0
 0: len 30; hex 36626361393436342d663735642d343134312d623231652d306130623961; asc 6bca9464-f75d-4141-b21e-0a0b9a; (total 36 bytes);
 1: len 16; hex 34373236323438353934373634393231; asc 4726248594764921;;
 2: len 6; hex 000028626bdf; asc   (bk ;;
 3: len 7; hex ba0004c0290d2e; asc     ) .;;
 4: SQL NULL;
 5: SQL NULL;
 6: SQL NULL;
 7: len 8; hex 8000000000000a07; asc         ;;
 8: len 5; hex 99ace72654; asc    &T;;
 9: SQL NULL;

*** (2) TRANSACTION:
TRANSACTION 677538783, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
5 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 3
MySQL thread id 15333007, OS thread handle 140113480660736, query id 2884722486 10.96.7.206 app_user Update
INSERT into bars(field1, field2, field3, created_time) VALUES('6bca9464-f75d-4141-b21e-0a0b9a7d06ca', '1656221208545861', 272.97000, CURRENT_TIMESTAMP())
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 109 page no 16087 n bits 192 index foo_idx of table `bar`.`bars` trx id 677538783 lock_mode X locks rec but not gap
Record lock, heap no 120 PHYSICAL RECORD: n_fields 10; compact format; info bits 0
 0: len 30; hex 36626361393436342d663735642d343134312d623231652d306130623961; asc 6bca9464-f75d-4141-b21e-0a0b9a; (total 36 bytes);
 1: len 16; hex 34373236323438353934373634393231; asc 4726248594764921;;
 2: len 6; hex 000028626bdf; asc   (bk ;;
 3: len 7; hex ba0004c0290d2e; asc     ) .;;
 4: SQL NULL;
 5: SQL NULL;
 6: SQL NULL;
 7: len 8; hex 8000000000000a07; asc         ;;
 8: len 5; hex 99ace72654; asc    &T;;
 9: SQL NULL;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 109 page no 16087 n bits 192 index foo_idx of table `bar`.`bars` trx id 677538783 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 120 PHYSICAL RECORD: n_fields 10; compact format; info bits 0
 0: len 30; hex 36626361393436342d663735642d343134312d623231652d306130623961; asc 6bca9464-f75d-4141-b21e-0a0b9a; (total 36 bytes);
 1: len 16; hex 34373236323438353934373634393231; asc 4726248594764921;;
 2: len 6; hex 000028626bdf; asc   (bk ;;
 3: len 7; hex ba0004c0290d2e; asc     ) .;;
 4: SQL NULL;
 5: SQL NULL;
 6: SQL NULL;
 7: len 8; hex 8000000000000a07; asc         ;;
 8: len 5; hex 99ace72654; asc    &T;;
 9: SQL NULL;

*** WE ROLL BACK TRANSACTION (2)

I'm having difficulties in understanding that log. What is the common resource that the two transactions are fighting over? Is it the index record for that particular row that is being inserted (with id = 6bca9464-f75d-4141-b21e-0a0b9a7d06ca)?

From what I understand and read about, the first transaction is trying to acquire S lock (which is for reading), the second transaction holds an X lock (exclusive lock) for the insert. But what is that the second transaction is waiting for that the first transaction is holding? To me it seems that the second transaction is waiting on a lock that it is holding itself - is this wrong?

Also, is it possible that the two transactions are waiting on a lock that is not held by them?

What does the gap and no gap part means?

What is the preferred way of solving this type of issue?

Edit: This is the schema

CREATE TABLE `bars` (
    `field1` VARCHAR(50) NOT NULL,
    `field2` VARCHAR(50) NOT NULL,
    `field3` VARCHAR(50) NULL DEFAULT NULL,
    `field4` ENUM('WHOLE','PARTIAL') NOT NULL,
    `field5` TINYINT(4) NULL DEFAULT NULL,
    `field6` DECIMAL(16,2) NOT NULL,
    `created_time` DATETIME NOT NULL,
    `finished_time` DATETIME NULL DEFAULT NULL,
    UNIQUE INDEX `foo_idx` (`field1`, `field2`),
    CONSTRAINT `f1_idx` FOREIGN KEY (`field1`) REFERENCES `bays` (`field1`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;

and then two migrations afterwards

ALTER TABLE `bars`
ADD FOREIGN KEY `ff_fk` (`field1`, `field2`)
REFERENCES `fields_references` (`field1`, `field2`);
ALTER TABLE bars
DROP CONSTRAINT ff_fk;

Solution

  • The contention is between the REPLACE run by pt-table-checksum and the INSERT run by your app.

    You asked about locking SELECT. An ordinary SELECTs is lock-free by default, but if a SELECT is part of a locking statement, it implicitly becomes a locking SELECT. That is, because pt-table-checksum runs REPLACE INTO checksums ... SELECT FROM bars, it must acquire an S lock on some rows in bars.

    The same is true of any SELECT that is used in a statement that writes data to a row or a variable.

    • CREATE TABLE ... SELECT ...
    • INSERT INTO ... SELECT ...
    • SET @variable = (SELECT ...)
    • SELECT ... INTO @variable
    • Using SELECT in subqueries in an UPDATE or DELETE statement
    • Using SELECT in a trigger

    Changing your app's transaction isolation level to READ-COMMITTED could help avoid the gap locks. But maybe pt-table-checksum has its own transaction isolation level that will create gap locks anyway.

    The bottom line is that deadlocks happen. You can't eliminate them all, they're a natural outcome when locking is done by concurrent sessions. So you should design code to catch exceptions and retry as needed.


    Re your comment:

    Transaction 1 doesn't necessarily hold another lock. The following sequence of actions could result in a deadlock:

    1. Transaction 2 (your app) acquires some lock on a single row in bars.

    2. Transaction 1 (pt-table-checksum) needs locks on a batch of multiple rows. At least one is already locked by transaction 2, so pt-table-checksum waits.

    3. Transaction 2 wants a lock on another row it didn't have a lock on before, but the row is part of the batch that pt-table-checksum is trying to lock.