Search code examples
ruby-on-railsrubyherokudelayed-jobruby-on-rails-5

How do I handle long running jobs on Heroku?


I want to use Heroku but the fact they restart dynos every 24 hours at random times is making things a bit difficult.

I have a series of jobs dealing with payment processing that are very important, and I want them backed by the database so they're 100% reliable. For this reason, I chose DJ which is slow.

Because I chose DJ, it means that I also can't just push 5,000,000 events to the database at once (1 per each email send).

Because of THAT, I have longer running jobs (send 200,000 text messages over a few hours).

With these longer running jobs, it's more challenging to get them working if they're cut off right in the middle.

It appears heroku sends SIGTERM and then expects the process to shut down within 30 seconds. This is not going to happen for my longer jobs.

Now I'm not sure how to handle them... the only way I can think is to update the database immediately after sending texts for instance (for example, a sms_sent_at column), but that just means I'm destroying database performance instead of sending a single update query for every batch.

This would be a lot better if I could schedule restarts, at least then I could do it at night when I'm 99% likely not going to be running any jobs that don't take longer than 30 seconds to shut down.

Or.. another way, can I 'listen' for SIGTERM within a long running DJ and at least abort the loop early so it can resume later?


Solution

  • Here's the proper answer, you listen for SIGTERM (I'm using DJ here) and then gracefully rescue. It's important that the jobs are idempotent.

    Long running delayed_job jobs stay locked after a restart on Heroku

    class WithdrawPaymentsJob
    
      def perform
        begin
          term_now = false
          old_term_handler = trap('TERM') { term_now = true; old_term_handler.call }
    
          loop do
    
            puts 'doing long running job'
            sleep 1
    
            if term_now
              raise 'Gracefully terminating job early...'
            end
          end
    
        ensure
          trap('TERM', old_term_handler)
        end
      end
    
    end
    

    Here's how you solve it with Que:

        if Que.worker_count.zero?
          raise 'Gracefully terminating job early...'
        end