Search code examples
mysqlruby-on-railsmultithreadingdatabase-deadlocksdatabase-cleaner

Lock wait timeout when running spec of a multithread code


I have the following class which changes all user names to John in many threads

class Users::UpdateName
  def call
    User
      .unscoped
      .find_in_batches do |batch|
        update_batch(batch)
      end
  end

  def update_batch(users_batch)
    users_batch
      .each_slice(100)
      .map { |users| Thread.new { update_all(users) } }
      .each(&:join)
  end

  def update_all(users)
    users.each { |u| u.update_columns(first_name: 'John') }
  end
end

where class User is an Active Record model with default scope default_scope { where(archived: false) }

I test my class the following way:

describe Users::UpdateName

  it 'should update first name of an archived user' do
    user = User.create(first_name: 'Jane', archived: true)

    Users::UpdateName.new.call

    expect(user.reload.first_name).to eq 'John'
  end

end

but when I run the tests it fails with

 ActiveRecord::StatementInvalid:
   Mysql2::Error: Lock wait timeout exceeded; try restarting transaction: UPDATE `users` SET `users`.`first_name` = 'John' WHERE `users`.`id` = 10

When I inspect the db state with show engine innodb status; it gives me

TRANSACTIONS
------------
Trx id counter 4359880
Purge done for trx's n:o < 4359879 undo n:o < 0 state: running but idle
History list length 1034
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 198, OS thread handle 0x700005048000, query id 43397 localhost root init
show engine innodb status
---TRANSACTION 4359879, ACTIVE 13 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 360, 1 row lock(s)
MySQL thread id 200, OS thread handle 0x700004f7c000, query id 43396 localhost root updating
UPDATE `users` SET `users`.`first_name` = 'John' WHERE `users`.`id` = 12
------- TRX HAS BEEN WAITING 13 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 173721 page no 3 n bits 72 index `PRIMARY` of table `mysql_lock_wait_timeout_problem_test`.`users` trx id 4359879 lock_mode X locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
 0: len 8; hex 800000000000000c; asc         ;;
 1: len 6; hex 0000004286c2; asc    B  ;;
 2: len 7; hex d6000001510110; asc     Q  ;;
 3: len 4; hex 4a616e65; asc Jane;;
 4: len 1; hex 81; asc  ;;
 5: len 5; hex 99a034ac56; asc   4 V;;
 6: len 5; hex 99a034ac56; asc   4 V;;

------------------
---TRANSACTION 4359874, ACTIVE 13 sec
2 lock struct(s), heap size 360, 1 row lock(s), undo log entries 1
MySQL thread id 199, OS thread handle 0x700004ef4000, query id 43393 localhost root
Trx read view will not see trx with id >= 4359879, sees < 4359879
--------
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (read thread)
I/O thread 4 state: waiting for i/o request (read thread)
I/O thread 5 state: waiting for i/o request (read thread)
I/O thread 6 state: waiting for i/o request (write thread)
I/O thread 7 state: waiting for i/o request (write thread)
I/O thread 8 state: waiting for i/o request (write thread)
I/O thread 9 state: waiting for i/o request (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
1320 OS file reads, 89466 OS file writes, 79251 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.38 writes/s, 0.24 fsyncs/s
-------------------------------------

Do you have any idea why the lock timeout above happens?

When I remove multithread code the spec passes:

  def update_batch(users_batch)
    update_all(users_batch)
  end

You can find out how to initialize the app to reproduce the error in this gist. Also you can find all the source code in Github


Solution

  • I was able to reproduce this bug with the simpler code:

    ActiveRecord::Base.transaction do
      user = User.create! first_name: 'Jane'
    
      t = Thread.new do
        user.update_columns first_name: 'John'
      end
    
      t.join
    end
    

    Explanation:

    The main thread creates a record within a transaction and marks this record as not available for updates. At the same time another thread tries to update that record and waits for the main transaction. And the main transaction cannot be completed because it waits for another thread

    where class User is an Active Record model with default scope default_scope { where(archived: false) }

    Default scope in the user class doesn't play any role there