Hard issue happening in production for a long time, we have no clue about where it's coming from. Can sometimes reproduces it on localhost, Heroku Enterprise support has been clue-less about this.
On our production database, we currently have the following setup:
a SELECT * FROM pg_stat_activity GROUP BY client_addr
and count the number of connections per instance shows that more than 1 PSQL connection is opened for one passenger process during our peak days.
Assumptions:
passenger-status
during those peaks)Here is a screenshot of what looks the SELECT * FROM pg_stat_activity;
:
In the screenshot, we can see that there are 45 psql connections coming from the same dyno that runs passenger. If we followed our previous logic, it should not have more than 1 connection per Passenger process, so 25.
The logs doesn't look unusual, nothing mentioning either a dyno crash / process crash.
Here is a screenshot of our passenger status for the same dyno (different time, just to prove that there are not more processes than 25 created for one dyno):
And finally one of the response we got from the Heroku support (Amazing support btw)
I have also seen previous reports of Passenger utilising more connections than expected, but most were closed due to difficulty reproducing, unfortunately.
In the Passenger documentation, it's explained that Passenger handle itself the ActiveRecord connections.
Any leads appreciated. Thanks!
Various information:
2.4.x
5.1.x
5.3.x
10.x
5.1.x
If you need any more info, just let me know in the comments, I will happily update this post.
One last thing: We use ActionCable. I've read somewhere that passenger is handling weirdly the socket connections (Opens a somewhat hidden process to keep the connection alive). This is one of our leads, but so far no luck in reproducing it on localhost. If anyone can confirm how Passenger handles ActionCable connections, it would be much appreciated.
Update 1 (01/10/2018):
Experimented:
We finally managed to fix the issue we had on Passenger. We have had this issue for a very long time actually.
The fix
If you use ActionCable, and your default cable route is /cable
, then change the Procfile from:
web: bundle exec passenger start -p $PORT --max-pool-size $PASSENGER_MAX_POOL_SIZE
to
web: bundle exec passenger start -p $PORT --max-pool-size $PASSENGER_MAX_POOL_SIZE --unlimited-concurrency-path /cable
Explanation
Before the change, each socket connection (ActionCable) would take one single process in Passenger. But a Socket is actually something that should not take a whole process. A process can handle many many open socket connection. (Many is more than 10thousands at the same time for some big names). Fortunately, we have much lower socket connections, but still.
After the change, we basically told Passenger to not take a whole process to handle one socket connection, but rather dedicate a whole process to handle all the socket connections.
Documentation
Some metrics, after 3 weeks with the fix