Search code examples
multithreadingruby-on-rails-4jrubyjrubyonrails

Multithreaded model building in rails 4 with jRuby


I'm trying to optimize/multi-thread building a very large number of models (300+) all at once to try to speed up the creation this table to be saved to the database in my Rails 4 app.

I tried to move as many references to objects etc outside of the threads with things like memo variables and such, but I'm just not sure what to try anymore.

The code I have is as follows, I tried to keep the code that is being multi-threaded as small as possible but I keep running into circular dependency errors and/or not all of the record are created. Any help is appreciated.

Example 1:

 def create
    @checklist = Checklist.new(checklist_params)

    respond_to do |format|
      if @checklist.save

        tasks = Task.where(:active => true)
        checklist_date_as_time = Time.parse(@checklist.date.to_s).at_beginning_of_day
        checklist_id = @checklist.id.to_i
        threads = []

        ActiveRecord::Base.transaction do
          tasks.each do |task|
            time = task.start_time
            begin
              threads << Thread.new do
                complete_time = checklist_date_as_time + time.hour.hours + time.min.minutes
                task.responses.build( task_start_time: complete_time, must_complete_by: complete_time + task.time_window, checklist_id: checklist_id, task_id: task.id)
              end
            end while (time += task.frequency.minutes) < task.end_time
            threads.map(&:join)
            task.save
          end
        end

        format.html { redirect_to @checklist, notice: 'Checklist was successfully created.' }
        format.json { render :show, status: :created, location: @checklist }
      else
        format.html { render :new }
        format.json { render json: @checklist.errors, status: :unprocessable_entity }
      end
    end

Solution

  • AR is not "thread-safe" ... that means that a single record instance's behaviour/correctness when shared between threads is not defined/guaranteed by the framework.

    the easiest answer to your question would be to perform the whole tasks = ...; ActiveRecord::Base.transaction do ... work in 1 background thread (frameworks such as DelayedJob might help) - so that the "heavy" computation is not part of the response cycle.

    also be aware that using multiple threads might cause you to utilize multiple connections - thus essentially draining the AR pool. it also means that (depending on what's going on during task.responses.build) the desired intention with ActiveRecord::Base.transaction { ... } might not be correct (due multiple connection objects being involved).