Search code examples
ruby-on-railspostgresqlherokupassengeractioncable

Passenger uses more PostgreSQL connection than expected


Hard issue happening in production for a long time, we have no clue about where it's coming from. Can sometimes reproduces it on localhost, Heroku Enterprise support has been clue-less about this.

On our production database, we currently have the following setup:

  • Passenger Standalone, threading disabled, limited to 25 processes MAX. No min setup.
  • 3 web dynos

a SELECT * FROM pg_stat_activity GROUP BY client_addr and count the number of connections per instance shows that more than 1 PSQL connection is opened for one passenger process during our peak days.

Assumptions:

  • A single address is about a single Dyno (Confirmed by Heroku staff)
  • Passenger does not spawn more than 25 processes at the time (confirmed with passenger-status during those peaks)

Here is a screenshot of what looks the SELECT * FROM pg_stat_activity;:

enter image description here In the screenshot, we can see that there are 45 psql connections coming from the same dyno that runs passenger. If we followed our previous logic, it should not have more than 1 connection per Passenger process, so 25.

The logs doesn't look unusual, nothing mentioning either a dyno crash / process crash.

Here is a screenshot of our passenger status for the same dyno (different time, just to prove that there are not more processes than 25 created for one dyno): enter image description here

And finally one of the response we got from the Heroku support (Amazing support btw)

I have also seen previous reports of Passenger utilising more connections than expected, but most were closed due to difficulty reproducing, unfortunately.

In the Passenger documentation, it's explained that Passenger handle itself the ActiveRecord connections.

Any leads appreciated. Thanks!

Various information:

  • Ruby Version: 2.4.x
  • Rails Version: 5.1.x
  • Passenger Version: 5.3.x
  • PG Version: 10.x
  • ActiveRecord Version: 5.1.x

If you need any more info, just let me know in the comments, I will happily update this post.

One last thing: We use ActionCable. I've read somewhere that passenger is handling weirdly the socket connections (Opens a somewhat hidden process to keep the connection alive). This is one of our leads, but so far no luck in reproducing it on localhost. If anyone can confirm how Passenger handles ActionCable connections, it would be much appreciated.

Update 1 (01/10/2018):

Experimented:


Solution

  • We finally managed to fix the issue we had on Passenger. We have had this issue for a very long time actually.

    The fix

    If you use ActionCable, and your default cable route is /cable, then change the Procfile from:

    web: bundle exec passenger start -p $PORT --max-pool-size $PASSENGER_MAX_POOL_SIZE
    

    to

    web: bundle exec passenger start -p $PORT --max-pool-size $PASSENGER_MAX_POOL_SIZE --unlimited-concurrency-path /cable
    

    Explanation

    Before the change, each socket connection (ActionCable) would take one single process in Passenger. But a Socket is actually something that should not take a whole process. A process can handle many many open socket connection. (Many is more than 10thousands at the same time for some big names). Fortunately, we have much lower socket connections, but still.

    After the change, we basically told Passenger to not take a whole process to handle one socket connection, but rather dedicate a whole process to handle all the socket connections.

    Documentation

    Some metrics, after 3 weeks with the fix

    • Number of forked processes on Passenger dramatically decreased (from 75 processes to ~ 15 processes)
    • Global memory usage on the web dynos dramatically decreased (related to previous point on forked Passenger processes)
    • The global number of PSQL connections dramatically decreased and has been steady for two days (even after deployment). (from 150 to ~30 connections)
    • Number of PSQL connections per dyno dramatically decreased, (from ~50 per dyno to less than 10 per dyno)
    • The number of Redis connections decreased and has been steady for two days (even after deployment)
    • Average memory usage on PostgreSQL dramatically decreased and has been steady for two days.
    • The overall throughput is a bit higher than usual (Throughput is the number of requests handled per minute)