I have two Sidekiq jobs. The first loads a feed of articles in JSON and splits it into multiple jobs. It also creates a log and stores a start_time
.
class LoadFeed
include Sidekiq::Worker
def perform url
log = Log.create! start_time: Time.now, url: url
articles = load_feed(url) # this one loads the feed
articles.each do |article|
ProcessArticle.perform_async(article, log.id)
end
end
end
The second job processes an article and updates the end_time
field of the former created log to find out, how long the whole process (loading the feed, splitting it into jobs, processing the articles) took.
class ProcessArticle
include Sidekiq::Worker
def perform data, log_id
process(data)
Log.find(log_id).update_attribute(:end_time, Time.now)
end
end
But now I have some problems / questions:
Log.find(log_id).update_attribute(:end_time, Time.now)
isn't atomic, and because of the async behaviour of the jobs, this could lead to incorrectend_time
values. Is there a way to do an atomic update of adatetime
field in MySQL with the current time?- The feed can get pretty long (~ 800k articles) and updating a value 800k times when you would just need the last one seems like a lot of unnecessary work. Any ideas how to find out which one was the last job, and only update the
end_time
field in this job?
For 1) you could do an update with one less query and let MySQL find the time:
Log.where(id: log_id).update_all('end_time = now()')
For 2) one way to solve this would be to update your end time only if all articles have been processed. For example by having a boolean that you could query. This does not reduce the number of queries but would certainly have better performance.
if feed.articles.needs_processing.none?
Log.where(id: log_id).update_all('end_time = now()')
end