Search code examples
ruby-on-railsrubyredisresqueactiverecord-import

Rails+resque background job import not adding anything to the database


I have an issue with importing a lot of records from a user provided excel file into a database. The logic for this is working fine, and I’m using ActiveRecord-import to cut down on the number of database calls. However, when a file is too large, the processing can take too long and Heroku will return a timeout. Solution: Resque and moving the processing to a background job.

So far, so good. I’ve needed to add CarrierWave to upload the files to S3 because I can’t just hold the file in memory for the background job. The upload portion is also working fine, I created a model for them and am passing the IDs through to the queued job to retrieve the file later as I understand I can’t pass a whole ActiveRecord object through to the job.

I’ve installed Resque and Redis locally, and everything seems to be setup correctly in that regard. I can see the jobs I’m creating being queued and then run without failing. The job seems to run fine, but no records are added to the database. If I run the code from my job line by line in the console, the records are added to the database as I would expect. But when the queued jobs I’m creating run, nothing happens.

I can’t quite work out where the problem might be.

Here’s my upload controller’s create action:

def create
  @upload = Upload.new(upload_params)
  if @upload.save
    Resque.enqueue(ExcelImportJob, @upload.id)
    flash[:info] = 'File uploaded.
        Data will be processed and added to the database.'
    redirect_to root_path
  else
    flash[:warning] = 'Upload failed. Please try again.'
    render :new
  end
end

This is a simplified version of the job with fewer sheet columns for clarity:

class ExcelImportJob < ApplicationJob
  @queue = :default

  def perform(upload_id)
    file = Upload.find(upload_id).file.file.file
    data = parse_excel(file)
    if header_matches? data
      # Create a database entry for each row, ignoring the first header row
      # using activerecord-import
      sales = []
      data.drop(1).each_with_index do |row, index|
        sales << Sale.new(row)
        if index % 2500 == 0
          Sale.import sales
          sales = []
        end
      end
      Sale.import sales
    end

    def parse_excel(upload)
      # Open the uploaded excel document
      doc = Creek::Book.new upload

      # Map rows to the hash keys from the database
      doc.sheets.first.rows.map do |row|
        { date: row.values[0],
          title: row.values[1],
          author: row.values[2],
          isbn: row.values[3],
          release_date: row.values[5],
          units_sold: row.values[6],
          units_refunded: row.values[7],
          net_units_sold: row.values[8],
          payment_amount: row.values[9],
          payment_amount_currency: row.values[10] }
      end
    end

    # Returns true if header matches the expected format
    def header_matches?(data)
      data.first == {:date => 'Date',
                     :title => 'Title',
                     :author => 'Author',
                     :isbn => 'ISBN',
                     :release_date => 'Release Date',
                     :units_sold => 'Units Sold',
                     :units_refunded => 'Units Refunded',
                     :net_units_sold => 'Net Units Sold',
                     :payment_amount => 'Payment Amount',
                     :payment_amount_currency => 'Payment Amount Currency'}
    end
  end
end

I can probably have some improved logic anyway as right now I’m holding the whole file in memory, but that isn’t the issue I’m having – even with a small file that has only 500 or so rows, the job doesn’t add anything to the database.

Like I said my code worked fine when I wasn’t using a background job, and still works if I run it in the console. But for some reason the job is doing nothing.

This is my first time using Resque so I don’t know if I’m missing something obvious? I did create a worker and as I said it does seem to run the job. Here’s the output from Resque’s verbose formatter:

*** resque-1.27.4: Waiting for default
*** Checking default
*** Found job on default
*** resque-1.27.4: Processing default since 1508342426 [ExcelImportJob]
*** got: (Job{default} | ExcelImportJob | [15])
*** Running before_fork hooks with [(Job{default} | ExcelImportJob | [15])]
*** resque-1.27.4: Forked 63706 at 1508342426
*** Running after_fork hooks with [(Job{default} | ExcelImportJob | [15])]
*** done: (Job{default} | ExcelImportJob | [15])

In the Resque dashboard the jobs aren’t logged as failed. They get executed and I can see an increment in the ‘processed’ jobs on the stats page. But as I say the DB remains untouched. What’s going on? How can I debug the job more clearly? Is there a way to get into it with Pry?


Solution

  • It looks like my problem was with Resque.enqueue(ExcelImportJob, @upload.id).

    I changed my code to ExcelImportJob.perform_later(@upload.id) and now my code actually runs!

    I also added a resque.rake task to lib/tasks as described here: http://bica.co/2015/01/20/active-job-resque/.

    That link also notes how to use rails runner to call the job without running the full Rails server and triggering the job, which is useful for debugging.

    Strangely, I didn't quite manage to get the job to print anything to STDOUT as suggested by @hoffm but at least it led me down a good avenue of inquiry.

    I still don't fully understand the difference between why calling Resqueue.enqueue still added my jobs to the queue and indeed seemed to run them, but the code wasn't executed, so if someone has a better grasp and an explanation, that would be much appreciated.

    TL;DR: calling perform_later rather than Resque.enqueue fixed the problem but I don't know why.