I have two Sidekiq jobs. The first loads a feed of articles in JSON and splits it into multiple jobs. It also creates a log and stores a start_time
class LoadFeed
include Sidekiq::Worker
def perform url
log = Log.create! start_time: Time.now, url: url
articles = load_feed(url) # this one loads the feed
articles.each do |article|
ProcessArticle.perform_async(article, log.id)
The second job processes an article and updates the end_time
field of the former created log to find out, how long the whole process (loading the feed, splitting it into jobs, processing the articles) took.
class ProcessArticle
include Sidekiq::Worker
def perform data, log_id
Log.find(log_id).update_attribute(:end_time, Time.now)
But now I have some problems / questions:
Log.find(log_id).update_attribute(:end_time, Time.now)
isn't atomic, and because of the async behaviour of the jobs, this could lead to incorrectend_time
values. Is there a way to do an atomic update of adatetime
field in MySQL with the current time?- The feed can get pretty long (~ 800k articles) and updating a value 800k times when you would just need the last one seems like a lot of unnecessary work. Any ideas how to find out which one was the last job, and only update the
field in this job?
For 1) you could do an update with one less query and let MySQL find the time:
Log.where(id: log_id).update_all('end_time = now()')
For 2) one way to solve this would be to update your end time only if all articles have been processed. For example by having a boolean that you could query. This does not reduce the number of queries but would certainly have better performance.
if feed.articles.needs_processing.none?
Log.where(id: log_id).update_all('end_time = now()')