Search code examples
ruby-on-railsdatabaseimportconcurrencyrake

Rails (rake) Data Import Concurrently


I am trying to migrate from a rails based service to another framework. I am trying to export my data from rails using rake tasks and import into the new schema.

Right now my rake tasks do all of the column mappings, which work fine. For example Customers in the rails app are now Accounts in the new application.

The problem is that my tasks take hours to complete. Essentially I do User.find_each -> then Transaction.find_each and so on. Each of these tables has tens of thousands of records.

I did my first pass at optimization and removed as many db calls as I could. I am utilizing redis where I can as well. It seems like to me I have gotten to a point where I need the tasks to run concurrently.

I looked into using the parallel gem. The example in the documentation is the following:

Parallel.each(User.all, in_processes: 8) do |user|
  user.update_attribute(:some_attribute, some_value)
end
User.connection.reconnect!

I am worried I can't use that because when I call Customer.all my vm freezes because I can't keep them all in memory (hence the find_each).

I guess my question is it possible to use the parallel gem with find_each? I cannot find anything in their documentation or examples online doing such. Is there another solution I can do to for iterating over the Customers concurrently?


Solution

  • For the question,

    is it possible to use the parallel gem with find_each? I cannot find anything in their documentation or examples online doing such. Is there another solution I can do to for iterating over the Customers concurrently?

    I would recommend you to use find_in_batches by Activerecord. You can query for a batch of records and then iterate over each element in the batch using Parallel. For example, it can be something like

    User.find_in_batches do |batch|
      Parallel.each(batch,in_processes: 8) do |user|
        ...
      end
    end