I have two Rails app (production and staging environment) in a remote server.
I am currently experiencing a strange problem where Puma would sometimes give me timeout after I finished deployment (via cap deploy). This has been happening for quite some time now and it's getting more frequent. Whenever this happens, I need to restart Puma server (either from cap puma:stop
and cap puma:start
), or manually do kill -9 <pid of puma instance>
. However, in both cases I need to firstly rm puma.sock
from shared/tmp/sockets
directory.
On the other hand, my production environment did not experience this issue. The difference between them is just # of commits, my staging environment is several (~50) commits ahead. Earlier when I merged staging to production and deployed, the same problem appears in production. So I rolled back my production to previous revision, restarted Puma, and the problem went away.
Note: cap puma:restart
somehow does not solve this; I have to kill current Puma instance, and start a new one in order to make this problem go away.
My current setup is:
On the time the error occurred, nothing logged into Rails log, but Nginx logs some error:
upstream timed out (110: Connection timed out) while reading response header from upstream
after waiting for 60 seconds, page for 500 is shown.recv() failed (104: Connection reset by peer) while reading response header from upstream
page for 500 shown instantly.connect() to unix:/var/deploy/medictrust-staging/shared/tmp/sockets/puma.sock failed (111: Connection refused) while connecting to upstream
page for 500 shown instantly.The errors above happen randomly; sometimes it's connection timed out, sometimes it's connection refused.. But the most frequent one is the connection timed out.
Strange thing is, Puma is not timing out if I access my application via cURL. There was no changes made within Puma or Nginx config, so is it possible that this is caused by application code?
How do I make this problem go away for good?
For me, the web server was timing out because there were long running queries all over the database, which hogs the available connections and makes Puma to wait for a new connection to be available.
As a first-aid, I restarted my MySQL server and it instantly works. I regret that I didn't log slow queries; because that query must be a result of some bad code in my Rails app.
Additionally, this SO answer also helps: Getting “Lock wait timeout exceeded; try restarting transaction” even though I'm not using a transaction