Search code examples
ruby-on-railspostgresqlherokudelayed-jobunicorn

Rails on Heroku, Unicorn and Delayed Job: PG::ConnectionBadappmodels/post.rb:93 PQconsumeInput() SSL SYSCALL error: Connection timed out


I'm running a Rails 4.1.9 (Ruby 2.2.0) on heroku with unicorn and using delayed jobs to process stuff in the background. At any given moment I've got about 8 workers running.

Occasionally I'll see the following error in my logs:

PG::ConnectionBad
PQconsumeInput() SSL SYSCALL error: Connection timed out
app/models/post.rb:93 build

These errors are always coming from some background jobs I am running.

As far as I understand it, Delayed Jobs does not actually use unicorn to run workers, it's just a single worker process per worker dyno. Yet all the issues I see around this seem to stem from unicorn.

My unicorn.rb file looks like:

worker_processes 3
timeout 30
preload_app true
listen ENV['PORT'], backlog: Integer(ENV['UNICORN_BACKLOG'] || 200)

before_fork do |server, worker|

  Signal.trap 'TERM' do
    puts 'Unicorn master intercepting TERM and sending myself QUIT instead'
    Process.kill 'QUIT', Process.pid
  end

  defined?(ActiveRecord::Base) and
    ActiveRecord::Base.connection.disconnect!
end

after_fork do |server, worker|

  Signal.trap 'TERM' do
    puts 'Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT'
  end

  defined?(ActiveRecord::Base) and
    ActiveRecord::Base.establish_connection
end

I googled around and came across the following links:

The first link tells me to add My config files to unicorn, the second tells me to change my DB_REAPING_FREQUENCY, and the third one tells me to upgrade my DB (I have the $50 a month DB already)

Any idea about what might be going wrong here and where to start fixing it? I'm not even sure where to look.


Solution

  • This ended up actually just being a job that took FOREVER (like 4 minutes) to run, due to some inefficient querying on my end.

    Took me a remarkably long time to figure out which jobs, for such a simple and dumb solution.

    I just waited until I saw that the number of jobs wasn't decreasing, ran this code:

    dj = Delayed::Job.where('run_at is not null').sample

    then got the handler with dj.handler to see what the actual method that was being called was, on what object, then just ran it myself and saw that it was really slow, and fixed it.