I have the following rake task:
namespace :backfill do
desc "backfill device id data"
task device_ids: :environment do
user_count = User.where.not(device_id: nil).count
puts "Begin device_id backfill for #{user_count} users"
batch_size, offset = 2000, 0
begin
puts "Beginning backfill batch from rows #{offset} to #{offset + batch_size} out of #{user_count} rows"
ActiveRecord::Base.connection.execute <<-SQL
INSERT INTO user_device_infos (user_id, user_device_id, last_login_at, created_at, updated_at)
SELECT users.id, users.device_id, users.last_sign_in_at, current_timestamp, current_timestamp
FROM users
LEFT JOIN user_device_infos
ON users.id = user_device_infos.user_id
AND users.device_id = user_device_infos.user_device_id
WHERE users.device_id IS NOT NULL
AND user_device_infos.user_id IS NULL
ORDER BY users.created_at DESC
LIMIT #{batch_size}
OFFSET #{offset}
SQL
offset += batch_size
end until offset > user_count
puts "backfill complete"
end
end
This job is copying data from a from a users header table into a device info sub table. It is running on a production db, thus the mini-batches to prevent locking. It has a few problems.
How can I
The problem was that I didn't need the offset in the sql clause as the left join already filtered out new records. Removing the offset both sped up and fixed the job.
Saving the sql output to a variable and printing puts result.cmd_status
helped verify that each batch was done successfully.