Search code examples
ruby-on-railsnginxcapistranopuma

Rails with Puma occasional timeout


I have two Rails app (production and staging environment) in a remote server.

I am currently experiencing a strange problem where Puma would sometimes give me timeout after I finished deployment (via cap deploy). This has been happening for quite some time now and it's getting more frequent. Whenever this happens, I need to restart Puma server (either from cap puma:stop and cap puma:start), or manually do kill -9 <pid of puma instance>. However, in both cases I need to firstly rm puma.sock from shared/tmp/sockets directory.

On the other hand, my production environment did not experience this issue. The difference between them is just # of commits, my staging environment is several (~50) commits ahead. Earlier when I merged staging to production and deployed, the same problem appears in production. So I rolled back my production to previous revision, restarted Puma, and the problem went away.

Note: cap puma:restart somehow does not solve this; I have to kill current Puma instance, and start a new one in order to make this problem go away.

My current setup is:

  • Rails 4.1
  • Puma
  • Nginx
  • Capistrano 3

On the time the error occurred, nothing logged into Rails log, but Nginx logs some error:

  • upstream timed out (110: Connection timed out) while reading response header from upstream after waiting for 60 seconds, page for 500 is shown.
  • recv() failed (104: Connection reset by peer) while reading response header from upstream page for 500 shown instantly.
  • connect() to unix:/var/deploy/medictrust-staging/shared/tmp/sockets/puma.sock failed (111: Connection refused) while connecting to upstream page for 500 shown instantly.

The errors above happen randomly; sometimes it's connection timed out, sometimes it's connection refused.. But the most frequent one is the connection timed out.

Strange thing is, Puma is not timing out if I access my application via cURL. There was no changes made within Puma or Nginx config, so is it possible that this is caused by application code?

How do I make this problem go away for good?


Solution

  • For me, the web server was timing out because there were long running queries all over the database, which hogs the available connections and makes Puma to wait for a new connection to be available.

    As a first-aid, I restarted my MySQL server and it instantly works. I regret that I didn't log slow queries; because that query must be a result of some bad code in my Rails app.

    Additionally, this SO answer also helps: Getting “Lock wait timeout exceeded; try restarting transaction” even though I'm not using a transaction