Search code examples
ruby-on-rails-3delayed-jobdb-charmer

Strange out of order behavior with rails collective_idea/delayed_job gem


[ Cross posting issue I posted on github to widen the audience ]

We are seeing a strange behavior when using the delay method inside a transaction. It can easily be reproduced in a rails console with a User model and the following code.

Does anyone know if delayed_job does something internally that changes the order of database operations as seen below?

Steps to reproduce

# User model has first_name and last_name attributes
# users table has at least one test record

# The code below is for use in a rails console. Just copy, paste and run.

# a simple operation to queue up as a delayed job
def change_user_name(user_id)
   u = User.find(user_id)
   u.first_name = "Delaid"
   u.save
end

User.transaction do
  u2 = User.first
  u2.last_name = "Willie"
  u2.save
  delay.change_user_name(u2.id)
  u2.last_name = "Wonka"
  u2.save
end

Expected behavior

When this is run, we expect the insert into the delayed_jobs table to happen in the order of operations shown, and within the transaction we have specified. This is confirmed by what we see in the rails console log, as shown below.

2015-02-03 09:17:55 -0600 (9606) : DEBUG :  (0.3ms)  BEGIN
2015-02-03 09:17:55 -0600 (9606) : DEBUG :  User Load (0.7ms)  SELECT `users`.* FROM `users` WHERE (user_type_id not in ('6')) LIMIT 1
2015-02-03 09:17:55 -0600 (9606) : DEBUG :  SQL (0.4ms)  INSERT INTO `history_logs` (`action`, `created_at`, `data`, `organization_id`, `updated_at`, `user_id`) VALUES ('update via on 50', '2015-02-03 15:17:55', '{\"last_name\":[\"van Tilburg\",\"Willie\"]}', NULL, '2015-02-03 15:17:55', 0)
2015-02-03 09:17:55 -0600 (9606) : DEBUG :  (0.4ms)  UPDATE `users` SET `last_name` = 'Willie', `updated_at` = '2015-02-03 15:17:55' WHERE `users`.`id` = 50
2015-02-03 09:17:55 -0600 (9606) : DEBUG :  (0.2ms)  BEGIN
2015-02-03 09:17:55 -0600 (9606) : DEBUG :  SQL (0.6ms)  INSERT INTO `delayed_jobs` (`attempts`, `created_at`, `failed_at`, `handler`, `last_error`, `locked_at`, `locked_by`, `priority`, `queue`, `run_at`, `updated_at`) VALUES (0, '2015-02-03 15:17:55', NULL, '--- !ruby/object:Delayed::PerformableMethod\nobject: !ruby/object\n prompt:\n :PROMPT_I: ! \'1.9.3p125 :%03n > \'\n :PROMPT_S: ! \'1.9.3p125 :%03n%l> \'\n :PROMPT_C: ! \'1.9.3p125 :%03n > \'\n :PROMPT_N: ! \'1.9.3p125 :%03n?> \'\n :RETURN: ! \" => %s \\n\"\n :AUTO_INDENT: true\nmethod_name: :change_user_name\nargs:\n- 50\n', NULL, NULL, NULL, 0, NULL, '2015-02-03 15:17:55', '2015-02-03 15:17:55')
2015-02-03 09:17:55 -0600 (9606) : DEBUG :  (9.4ms)  COMMIT
2015-02-03 09:17:55 -0600 (9606) : DEBUG :  SQL (0.4ms)  INSERT INTO `history_logs` (`action`, `created_at`, `data`, `organization_id`, `updated_at`, `user_id`) VALUES ('update via on 50', '2015-02-03 15:17:55', '{\"last_name\":[\"Willie\",\"Wonka\"]}', NULL, '2015-02-03 15:17:55', 0)
2015-02-03 09:17:55 -0600 (9606) : DEBUG :   (0.4ms)  UPDATE `users` SET `last_name` = 'Wonka', `updated_at` = '2015-02-03 15:17:55' WHERE `users`.`id` = 50
2015-02-03 09:17:55 -0600 (9606) : DEBUG :   (3.5ms)  COMMIT
 => true

Observed (incorrect) behavior

However, mysql binlogs show something different, like below. Note how the INSERT into the delayed_jobs table happens before the other operations and how it is placed inside its own transaction.

BEGIN
/*!*/;
# at 2278762
#150203  9:17:55 server id 1  end_log_pos 2278790       Intvar
SET INSERT_ID=24398/*!*/;
# at 2278790
#150203  9:17:55 server id 1  end_log_pos 2279486       Query   thread_id=290135        exec_time=0     error_code=0
SET TIMESTAMP=1422976675/*!*/;
INSERT INTO `delayed_jobs` (`attempts`, `created_at`, `failed_at`, `handler`, `last_error`, `locked_at`, `locked_by`, `priority`, `queue`, `run_at`, `updated_at`) VALUES (0, '2015-02-03 15:17:55', NULL, '--- !ruby/object:Delayed::PerformableMethod\nobject: !ruby/object\n  prompt:\n    :PROMPT_I: ! \'1.9.3p125 :%03n > \'\n    :PROMPT_S: ! \'1.9.3p125 :%03n%l> \'\n    :PROMPT_C: ! \'1.9.3p125 :%03n > \'\n    :PROMPT_N: ! \'1.9.3p125 :%03n?> \'\n    :RETURN: ! \" => %s \\n\"\n    :AUTO_INDENT: true\nmethod_name: :change_user_name\nargs:\n- 50\n', NULL, NULL, NULL, 0, NULL, '2015-02-03 15:17:55', '2015-02-03 15:17:55')
/*!*/;
# at 2279486
#150203  9:17:55 server id 1  end_log_pos 2279513       Xid = 129277545
COMMIT/*!*/;
BEGIN
/*!*/;
# at 2279592
#150203  9:17:55 server id 1  end_log_pos 2279620       Intvar
SET INSERT_ID=626645/*!*/;
# at 2279620
#150203  9:17:55 server id 1  end_log_pos 2279929       Query   thread_id=290137        exec_time=0     error_code=0
SET TIMESTAMP=1422976675/*!*/;
INSERT INTO `history_logs` (`action`, `created_at`, `data`, `organization_id`, `updated_at`, `user_id`) VALUES ('update via  on 50', '2015-02-03 15:17:55', '{\"last_name\":[\"van Tilburg\",\"Willie\"]}', NULL, '2015-02-03 15:17:55', 0)
/*!*/;
# at 2279929
#150203  9:17:55 server id 1  end_log_pos 2280106       Query   thread_id=290137        exec_time=0     error_code=0
SET TIMESTAMP=1422976675/*!*/;
UPDATE `users` SET `last_name` = 'Willie', `updated_at` = '2015-02-03 15:17:55' WHERE `users`.`id` = 50
/*!*/;
# at 2280106
#150203  9:17:55 server id 1  end_log_pos 2280134       Intvar
SET INSERT_ID=626646/*!*/;
# at 2280134
#150203  9:17:55 server id 1  end_log_pos 2280437       Query   thread_id=290137        exec_time=0     error_code=0
SET TIMESTAMP=1422976675/*!*/;
INSERT INTO `history_logs` (`action`, `created_at`, `data`, `organization_id`, `updated_at`, `user_id`) VALUES ('update via  on 50', '2015-02-03 15:17:55', '{\"last_name\":[\"Willie\",\"Wonka\"]}', NULL, '2015-02-03 15:17:55', 0)
/*!*/;
# at 2280437
#150203  9:17:55 server id 1  end_log_pos 2280613       Query   thread_id=290137        exec_time=0     error_code=0
SET TIMESTAMP=1422976675/*!*/;
UPDATE `users` SET `last_name` = 'Wonka', `updated_at` = '2015-02-03 15:17:55' WHERE `users`.`id` = 50
/*!*/;
# at 2280613
#150203  9:17:55 server id 1  end_log_pos 2280640       Xid = 129277538
COMMIT/*!*/;

Additional details

We also tested the scenario without the outer transaction (like below) and the same behavior was seen. Meaning, the INSERT into delayed_jobs happen before the other code and out of order.

# This code without an outer transaction results in the same buggy behavior
u2 = User.first
u2.last_name = "Willie"
u2.save
delay.change_user_name(u2.id)
u2.last_name = "Wonka"
u2.save

Further, it does not matter if we use save or save! in the various AR operations in the above code. The issue happens regardless.

Version Info

MySQL version

Variable_name   Value
innodb_version  5.5.31
protocol_version    10
slave_type_conversions  
version 5.5.31-0ubuntu0.12.04.1-log
version_comment (Ubuntu)
version_compile_machine x86_64
version_compile_os  debian-linux-gnu

Rails and gems

rails-3.2.17
delayed_job-4.0.4
delayed_job_active_record-4.0.2

Solution

  • After a lot of printf debugging inside delayed_job and active_record code, I figured out the problem. We have been using a gem called db_charmer to implement connection switching between primary (rw) and replica (ro) instance of our database and this gem under the covers implements connection switching logic. This was interfering with the transaction context logic in ActiveRecord.

    The following monkey-patch fixed the issue.

    I figured out the issue. We are using the [now defunct] db_charmer gem to do connection switching between primary and secondary database, and that was not taking effect on the delayed job model class. This causes us to use different connection instances for delayed_job vs other calls, and the transactions are treated separately.

    The following monkey-patch fixed the issue.

    class Delayed::Backend::ActiveRecord::Job
      # use the same as what you use for all your other AR model classes
      db_magic :connection => :primary, :slave => :replica, :force_slave_reads => false
    end
    

    The db_charmer gem is no longer supported and I need to figure out an alternative :-( In the meantime, this patch should work.