Search code examples
mysqlrubysidekiq

Sidekiq: Find last job


I have two Sidekiq jobs. The first loads a feed of articles in JSON and splits it into multiple jobs. It also creates a log and stores a start_time.

class LoadFeed
  include Sidekiq::Worker

  def perform url
    log = Log.create! start_time: Time.now, url: url
    articles = load_feed(url) # this one loads the feed
    articles.each do |article|
      ProcessArticle.perform_async(article, log.id)
    end
  end
end

The second job processes an article and updates the end_time field of the former created log to find out, how long the whole process (loading the feed, splitting it into jobs, processing the articles) took.

class ProcessArticle
  include Sidekiq::Worker

  def perform data, log_id
    process(data)
    Log.find(log_id).update_attribute(:end_time, Time.now)
  end
end

But now I have some problems / questions:

  1. Log.find(log_id).update_attribute(:end_time, Time.now) isn't atomic, and because of the async behaviour of the jobs, this could lead to incorrect end_time values. Is there a way to do an atomic update of a datetime field in MySQL with the current time?
  2. The feed can get pretty long (~ 800k articles) and updating a value 800k times when you would just need the last one seems like a lot of unnecessary work. Any ideas how to find out which one was the last job, and only update the end_time field in this job?

Solution

  • For 1) you could do an update with one less query and let MySQL find the time:

    Log.where(id: log_id).update_all('end_time = now()')
    

    For 2) one way to solve this would be to update your end time only if all articles have been processed. For example by having a boolean that you could query. This does not reduce the number of queries but would certainly have better performance.

    if feed.articles.needs_processing.none?
      Log.where(id: log_id).update_all('end_time = now()')
    end