Search code examples
phpsessioncakephpmemcachedamazon-elasticache

CakeSession::_startSession - Slow on Elasticache


We're running CakePHP 2.9, and using an Elasticache Cluster for Session Storage (which is stored via Memcached).

We've disabled PHP's in-built session garbage collection as recommended here: https://tideways.io/profiler/blog/php-session-garbage-collection-the-unknown-performance-bottleneck

session.gc_probability = 0

We have also set the probability setting to 0 within CakePHP's Cache config.

However; we're still having issues whereby occasionally we experience major slow-downs in CakeSession::_startSession, as reported by New Relic:

Slow CakeSession::_startSession

The Elasticache Cluster is not showing any metrics which would suggest there is a problem (unless there's some metric I'm not understanding correctly).

Any suggestions on how to diagnose this cause?


Solution

  • This issue appears to have been caused by session locking, something I wasn't even aware existed.

    This article explains how and why Session Locking exists: https://ma.ttias.be/php-session-locking-prevent-sessions-blocking-in-requests/

    What's important is that memcached has session locking turned on by default.

    In our case, we don't use Sessions for much other than Authentication, our application doesn't use the session information for storing User State (like a shopping cart would), so we simply disabled session locking with the php.ini setting:

    memcached.sess_locking = 0

    Since making this change, we've seeing a huge improvement in response times (~200ms average to ~160). This is especially noticeable on AJAX-heavy pages which load a lot of data concurrently. Previously it seems these requests were being loaded sequentially however they're now all serviced simultaneously, the difference in speed is incredible.

    While there are likely some edge cases we'll uncover over the coming weeks/months as a result of turning off session locking, this appears to be the cause of the issue, and this change seems to have stopped the problem from occurring.