Search code examples
ruby-on-railsrubydelayed-job

Requeue or evaluate delayed job from inside?


Is there a way to determine the status of a running delayed_job job from inside the job task itself? I have a job that interacts with a service that can be pretty flaky and for a certain class of connection failures I'd like to requeue the job and only raise an exception if a connection failure occurs again at the retry limit.

Pseudo-code to demonstrate what I want to be able to do:

def do_thing
  service.send_stuff(args)
rescue Exception1, Exception2
  if job.retries == JOBS_MAX
    raise
  else
    job.requeue
  end
end

I don't want to raise an exception on any failure because generally the job will be completed okay on a later retry and it is just making noise for me. I do want to know if it is never completed, though.


Solution

  • As you've said, if the Delayed Job runner gets to the end of the perform queue then it will be considered as a successful run and removed from the queue. So you just have to stop it from getting to the end. There isn't a requeue -- even if there was it'd be a new record with new attributes. So you may rethink whatever it is that is causing the job to notify you about exceptions. You could, for example, add a condition upon which to notify you...

    Potential Solutions

    You can get the default JOBS_MAX (as you pseudo-coded it) with Delayed::Worker.max_attempts or you can set your own per-job by defining a method, e.g.: max_attempts.

    # Fail permanently after the 10th failure for this job
    def max_attempts
      10
    end
    

    That is, this method will be usable given the following:

    You can also make use of callback hooks. Delayed Job will callback to your payload object via the error method if it is defined. So you may be able to use the error method to notify you of actual exceptions beyond a given attempt number. To do that...

    Within the callback, the Delayed::Job object itself is returned as the first argument:

    def error(job, exception)
      job.attempts # gives you the current attempt number
      # If job.attempts is greater than max_attempts then send exception notification
      # or whatever you want here...
    end
    

    So you can use the callbacks to start adding logic on when to notify yourself and when not to. I might even suggest making a base set of functionality that you can include into all of your payload objects to do these things... but that's up to you and your design.