Search code examples
memorymemory-leakslimitvarnish

Varnish 6 LTS /w CentOS 8 not respecting memory limits?


I hope I find you allright.

So, I had Varnish 6.x LTS on Centos 7 on a server with 64Gb of RAM, practically working perfectly as configured, and I updgraded my setup to a server with 128Gb of RAM and CentOS 8. With this setup, I often get Varnish restarts due to OOM, even though the configuration is as follows:

ExecStart=/usr/sbin/varnishd \
          -a :6081 \
          -f /etc/varnish/default.vcl \
          -l 200m \
          -s malloc,38G \
          -s Transient=malloc,8G \
          -p thread_pool_min=300 \
          -p thread_pool_max=500 \
          -p workspace_client=256k

On the old server I had a similar setup with

-s malloc, 26Gb

and it worked fine. The Transient part of the config parameter is only my attempt a few days back, to limit the transient cache and see if it helps, but no luck with that, in a sence that it still give indications that it will soon go OOM. I read a lot of articles about setting this up, and also I count on ~50% or even more overhead, but in this case, it just seems to go up without any limit. I could really use a 4-eye method on what I'm missing here, or at least a point where to go on.

My current varnishstat:

MGT.uptime              269509         1.00 Management process uptime
MGT.child_start              1         0.00 Child process started
MGT.child_exit               0         0.00 Child process normal exit
MGT.child_stop               0         0.00 Child process unexpected exit
MGT.child_died               0         0.00 Child process died (signal)
MGT.child_dump               0         0.00 Child process core dumped
MGT.child_panic              0         0.00 Child process panic
MAIN.summs             7869381        29.20 stat summ operations
MAIN.uptime             269510         1.00 Child process uptime
MAIN.sess_conn          184459         0.68 Sessions accepted
MAIN.sess_drop               0         0.00 Sessions dropped
MAIN.sess_fail               0         0.00 Session accept failures
MAIN.sess_fail_econnaborted            0         0.00 Session accept failures: connection aborted
MAIN.sess_fail_eintr                   0         0.00 Session accept failures: interrupted system call
MAIN.sess_fail_emfile                  0         0.00 Session accept failures: too many open files
MAIN.sess_fail_ebadf                   0         0.00 Session accept failures: bad file descriptor
MAIN.sess_fail_enomem                  0         0.00 Session accept failures: not enough memory
MAIN.sess_fail_other                   0         0.00 Session accept failures: other
MAIN.client_req_400                    0         0.00 Client requests received, subject to 400 errors
MAIN.client_req_417                    0         0.00 Client requests received, subject to 417 errors
MAIN.client_req                  2373462         8.81 Good client requests received
MAIN.cache_hit                    500010         1.86 Cache hits
MAIN.cache_hit_grace              106009         0.39 Cache grace hits
MAIN.cache_hitpass                     0         0.00 Cache hits for pass.
MAIN.cache_hitmiss                     0         0.00 Cache hits for miss.
MAIN.cache_miss                  1611440         5.98 Cache misses
MAIN.backend_conn                  12617         0.05 Backend conn. success
MAIN.backend_unhealthy                 0         0.00 Backend conn. not attempted
MAIN.backend_busy                      0         0.00 Backend conn. too many
MAIN.backend_fail                      0         0.00 Backend conn. failures
MAIN.backend_reuse               1966574         7.30 Backend conn. reuses
MAIN.backend_recycle             1976920         7.34 Backend conn. recycles
MAIN.backend_retry                     1         0.00 Backend conn. retry
MAIN.fetch_head                        0         0.00 Fetch no body (HEAD)
MAIN.fetch_length                 475423         1.76 Fetch with Length
MAIN.fetch_chunked               1497899         5.56 Fetch chunked
MAIN.fetch_eof                         0         0.00 Fetch EOF
MAIN.fetch_bad                         0         0.00 Fetch bad T-E
MAIN.fetch_none                     1325         0.00 Fetch no body
MAIN.fetch_1xx                         0         0.00 Fetch no body (1xx)
MAIN.fetch_204                         0         0.00 Fetch no body (204)
MAIN.fetch_304                      3073         0.01 Fetch no body (304)
MAIN.fetch_failed                   1462         0.01 Fetch failed (all causes)
MAIN.fetch_no_thread                   0         0.00 Fetch failed (no thread)
MAIN.pools                             2          .   Number of thread pools
MAIN.threads                         600          .   Total number of threads
MAIN.threads_limited                   0         0.00 Threads hit max
MAIN.threads_created                 923         0.00 Threads created
MAIN.threads_destroyed               323         0.00 Threads destroyed
MAIN.threads_failed                    0         0.00 Thread creation failed
MAIN.thread_queue_len                  0          .   Length of session queue
MAIN.busy_sleep                      878         0.00 Number of requests sent to sleep on busy objhdr
MAIN.busy_wakeup                     878         0.00 Number of requests woken after sleep on busy objhdr
MAIN.busy_killed                       0         0.00 Number of requests killed after sleep on busy objhdr
MAIN.sess_queued                     134         0.00 Sessions queued for thread
MAIN.sess_dropped                      0         0.00 Sessions dropped for thread
MAIN.req_dropped                       0         0.00 Requests dropped
MAIN.n_object                     218508          .   object structs made
MAIN.n_vampireobject                   0          .   unresurrected objects
MAIN.n_objectcore                 218665          .   objectcore structs made
MAIN.n_objecthead                 218773          .   objecthead structs made
MAIN.n_backend                         1          .   Number of backends
MAIN.n_expired                   1392690         5.17 Number of expired objects
MAIN.n_lru_nuked                       0         0.00 Number of LRU nuked objects
MAIN.n_lru_moved                  449086         1.67 Number of LRU moved objects
MAIN.n_lru_limited                     0         0.00 Reached nuke_limit
MAIN.losthdr                           0         0.00 HTTP header overflows
MAIN.s_sess                       184459         0.68 Total sessions seen
MAIN.s_pipe                            0         0.00 Total pipe sessions seen
MAIN.s_pass                       262012         0.97 Total pass-ed requests seen
MAIN.s_fetch                     1873452         6.95 Total backend fetches initiated
MAIN.s_synth                         490         0.00 Total synthetic responses made
MAIN.s_req_hdrbytes           1975945691      7331.62 Request header bytes
MAIN.s_req_bodybytes             4279860        15.88 Request body bytes
MAIN.s_resp_hdrbytes          1272508062      4721.56 Response header bytes
MAIN.s_resp_bodybytes       1536158038935   5699818.33 Response body bytes
MAIN.s_pipe_hdrbytes                   0         0.00 Pipe request header bytes
MAIN.s_pipe_in                         0         0.00 Piped bytes from client
MAIN.s_pipe_out                        0         0.00 Piped bytes to client
MAIN.sess_closed                     356         0.00 Session Closed
MAIN.sess_closed_err               86861         0.32 Session Closed with error
MAIN.sess_readahead                    0         0.00 Session Read Ahead
MAIN.sess_herd                   1856881         6.89 Session herd
MAIN.sc_rem_close                  97588         0.36 Session OK  REM_CLOSE
MAIN.sc_req_close                      0         0.00 Session OK  REQ_CLOSE
MAIN.sc_req_http10                     0         0.00 Session Err REQ_HTTP10
MAIN.sc_rx_bad                         0         0.00 Session Err RX_BAD
MAIN.sc_rx_body                        0         0.00 Session Err RX_BODY
MAIN.sc_rx_junk                        0         0.00 Session Err RX_JUNK
MAIN.sc_rx_overflow                    0         0.00 Session Err RX_OVERFLOW
MAIN.sc_rx_timeout                 86861         0.32 Session Err RX_TIMEOUT
MAIN.sc_tx_pipe                        0         0.00 Session OK  TX_PIPE
MAIN.sc_tx_error                       0         0.00 Session Err TX_ERROR
MAIN.sc_tx_eof                         0         0.00 Session OK  TX_EOF
MAIN.sc_resp_close                     0         0.00 Session OK  RESP_CLOSE
MAIN.sc_overload                       0         0.00 Session Err OVERLOAD
MAIN.sc_pipe_overflow                  0         0.00 Session Err PIPE_OVERFLOW
MAIN.sc_range_short                    0         0.00 Session Err RANGE_SHORT
MAIN.sc_req_http20                     0         0.00 Session Err REQ_HTTP20
MAIN.sc_vcl_failure                    0         0.00 Session Err VCL_FAILURE
MAIN.client_resp_500                   0         0.00 Delivery failed due to insufficient workspace.
MAIN.ws_backend_overflow               0         0.00 workspace_backend overflows
MAIN.ws_client_overflow                0         0.00 workspace_client overflows
MAIN.ws_thread_overflow                0         0.00 workspace_thread overflows
MAIN.ws_session_overflow               0         0.00 workspace_session overflows
MAIN.shm_records               317750020      1178.99 SHM records
MAIN.shm_writes                 11262489        41.79 SHM writes
MAIN.shm_flushes                  393170         1.46 SHM flushes due to overflow
MAIN.shm_cont                     189298         0.70 SHM MTX contention
MAIN.shm_cycles                       60         0.00 SHM cycles through buffer
MAIN.backend_req                 1979183         7.34 Backend requests made
MAIN.n_vcl                             1          .   Number of loaded VCLs in total
MAIN.n_vcl_avail                       1          .   Number of VCLs available
MAIN.n_vcl_discard                     0          .   Number of discarded VCLs
MAIN.vcl_fail                          0         0.00 VCL failures
MAIN.bans                              1          .   Count of bans
MAIN.bans_completed                    1          .   Number of bans marked 'completed'
MAIN.bans_obj                          0          .   Number of bans using obj.*
MAIN.bans_req                          0          .   Number of bans using req.*
MAIN.bans_added                        1         0.00 Bans added
MAIN.bans_deleted                      0         0.00 Bans deleted
MAIN.bans_tested                       0         0.00 Bans tested against objects (lookup)
MAIN.bans_obj_killed                   0         0.00 Objects killed by bans (lookup)
MAIN.bans_lurker_tested                0         0.00 Bans tested against objects (lurker)
MAIN.bans_tests_tested                 0         0.00 Ban tests tested against objects (lookup)
MAIN.bans_lurker_tests_tested            0         0.00 Ban tests tested against objects (lurker)
MAIN.bans_lurker_obj_killed              0         0.00 Objects killed by bans (lurker)
MAIN.bans_lurker_obj_killed_cutoff            0         0.00 Objects killed by bans for cutoff (lurker)
MAIN.bans_dups                                0         0.00 Bans superseded by other bans
MAIN.bans_lurker_contention                   0         0.00 Lurker gave way for lookup
MAIN.bans_persisted_bytes                    16          .   Bytes used by the persisted ban lists
MAIN.bans_persisted_fragmentation             0          .   Extra bytes in persisted ban lists due to fragmentation
MAIN.n_purges                                 0         0.00 Number of purge operations executed
MAIN.n_obj_purged                             0         0.00 Number of purged objects
MAIN.exp_mailed                         1822034         6.76 Number of objects mailed to expiry thread
MAIN.exp_received                       1822034         6.76 Number of objects received by expiry thread
MAIN.hcb_nolock                         2111450         7.83 HCB Lookups without lock
MAIN.hcb_lock                           1599004         5.93 HCB Lookups with lock
MAIN.hcb_insert                         1599003         5.93 HCB Inserts
MAIN.esi_errors                               0         0.00 ESI parse errors (unlock)
MAIN.esi_warnings                             0         0.00 ESI parse warnings (unlock)
MAIN.vmods                                    2          .   Loaded VMODs
MAIN.n_gzip                                   0         0.00 Gzip operations
MAIN.n_gunzip                           1963548         7.29 Gunzip operations
MAIN.n_test_gunzip                      1604680         5.95 Test gunzip operations
LCK.backend.creat                             2         0.00 Created locks
LCK.backend.destroy                           0         0.00 Destroyed locks
LCK.backend.locks                       3958375        14.69 Lock Operations
LCK.ban.creat                                 1         0.00 Created locks
LCK.ban.destroy                               0         0.00 Destroyed locks
LCK.ban.locks                           3507518        13.01 Lock Operations
LCK.busyobj.creat                       1998678         7.42 Created locks
LCK.busyobj.destroy                     1998580         7.42 Destroyed locks
LCK.busyobj.locks                     139797131       518.71 Lock Operations
LCK.cli.creat                                 1         0.00 Created locks
LCK.cli.destroy                               0         0.00 Destroyed locks
LCK.cli.locks                             99103         0.37 Lock Operations
LCK.exp.creat                                 1         0.00 Created locks
LCK.exp.destroy                               0         0.00 Destroyed locks
LCK.exp.locks                          11454148        42.50 Lock Operations
LCK.hcb.creat                                 1         0.00 Created locks
LCK.hcb.destroy                               0         0.00 Destroyed locks
LCK.hcb.locks                           2982034        11.06 Lock Operations
LCK.lru.creat                                 2         0.00 Created locks
LCK.lru.destroy                               0         0.00 Destroyed locks
LCK.lru.locks                           3663810        13.59 Lock Operations
LCK.mempool.creat                             5         0.00 Created locks
LCK.mempool.destroy                           0         0.00 Destroyed locks
LCK.mempool.locks                      10186086        37.79 Lock Operations
LCK.objhdr.creat                        1599316         5.93 Created locks
LCK.objhdr.destroy                      1380547         5.12 Destroyed locks
LCK.objhdr.locks                       27912994       103.57 Lock Operations
LCK.pipestat.creat                            1         0.00 Created locks
LCK.pipestat.destroy                          0         0.00 Destroyed locks
LCK.pipestat.locks                            0         0.00 Lock Operations
LCK.sess.creat                           184447         0.68 Created locks
LCK.sess.destroy                         184449         0.68 Destroyed locks
LCK.sess.locks                          4161689        15.44 Lock Operations
LCK.tcp_pool.creat                            2         0.00 Created locks
LCK.tcp_pool.destroy                          0         0.00 Destroyed locks
LCK.tcp_pool.locks                      7901859        29.32 Lock Operations
LCK.vbe.creat                                 1         0.00 Created locks
LCK.vbe.destroy                               0         0.00 Destroyed locks
LCK.vbe.locks                             99094         0.37 Lock Operations
LCK.vcapace.creat                             1         0.00 Created locks
LCK.vcapace.destroy                           0         0.00 Destroyed locks
LCK.vcapace.locks                             0         0.00 Lock Operations
LCK.vcl.creat                                 1         0.00 Created locks
LCK.vcl.destroy                               0         0.00 Destroyed locks
LCK.vcl.locks                           4010919        14.88 Lock Operations
LCK.vxid.creat                                1         0.00 Created locks
LCK.vxid.destroy                              0         0.00 Destroyed locks
LCK.vxid.locks                              768         0.00 Lock Operations
LCK.waiter.creat                              2         0.00 Created locks
LCK.waiter.destroy                            0         0.00 Destroyed locks
LCK.waiter.locks                       11657249        43.25 Lock Operations
LCK.wq.creat                                  3         0.00 Created locks
LCK.wq.destroy                                0         0.00 Destroyed locks
LCK.wq.locks                           12078010        44.81 Lock Operations
LCK.wstat.creat                               1         0.00 Created locks
LCK.wstat.destroy                             0         0.00 Destroyed locks
LCK.wstat.locks                         7148985        26.53 Lock Operations
MEMPOOL.busyobj.live                          8          .   In use
MEMPOOL.busyobj.pool                         10          .   In Pool
MEMPOOL.busyobj.sz_wanted                 65536          .   Size requested
MEMPOOL.busyobj.sz_actual                 65504          .   Size allocated
MEMPOOL.busyobj.allocs                  1979190         7.34 Allocations
MEMPOOL.busyobj.frees                   1979182         7.34 Frees
MEMPOOL.busyobj.recycle                 1976206         7.33 Recycled from pool
MEMPOOL.busyobj.timeout                  176828         0.66 Timed out from pool
MEMPOOL.busyobj.toosmall                      0         0.00 Too small to recycle
MEMPOOL.busyobj.surplus                     137         0.00 Too many for pool
MEMPOOL.busyobj.randry                     2984         0.01 Pool ran dry
MEMPOOL.req0.live                             4          .   In use
MEMPOOL.req0.pool                            11          .   In Pool
MEMPOOL.req0.sz_wanted                   262144          .   Size requested
MEMPOOL.req0.sz_actual                   262112          .   Size allocated
MEMPOOL.req0.allocs                      979824         3.64 Allocations
MEMPOOL.req0.frees                       979820         3.64 Frees
MEMPOOL.req0.recycle                     979057         3.63 Recycled from pool
MEMPOOL.req0.timeout                     147071         0.55 Timed out from pool
MEMPOOL.req0.toosmall                         0         0.00 Too small to recycle
MEMPOOL.req0.surplus                          0         0.00 Too many for pool
MEMPOOL.req0.randry                         767         0.00 Pool ran dry
MEMPOOL.sess0.live                            6          .   In use
MEMPOOL.sess0.pool                           10          .   In Pool
MEMPOOL.sess0.sz_wanted                     512          .   Size requested
MEMPOOL.sess0.sz_actual                     480          .   Size allocated
MEMPOOL.sess0.allocs                      92389         0.34 Allocations
MEMPOOL.sess0.frees                       92383         0.34 Frees
MEMPOOL.sess0.recycle                     91719         0.34 Recycled from pool
MEMPOOL.sess0.timeout                     65581         0.24 Timed out from pool
MEMPOOL.sess0.toosmall                        0         0.00 Too small to recycle
MEMPOOL.sess0.surplus                         0         0.00 Too many for pool
MEMPOOL.sess0.randry                        670         0.00 Pool ran dry
LCK.sma.creat                                 2         0.00 Created locks
LCK.sma.destroy                               0         0.00 Destroyed locks
LCK.sma.locks                          36735852       136.31 Lock Operations
SMA.s0.c_req                           16689946        61.93 Allocator requests
SMA.s0.c_fail                                 0         0.00 Allocator failures
SMA.s0.c_bytes                     231951766672    860642.52 Bytes allocated
SMA.s0.c_freed                     203375575362    754612.35 Bytes freed
SMA.s0.g_alloc                          2052078          .   Allocations outstanding
SMA.s0.g_bytes                      28576191310          .   Bytes outstanding
SMA.s0.g_space                      12225998002          .   Bytes available
SMA.Transient.c_req                     2704021        10.03 Allocator requests
SMA.Transient.c_fail                          0         0.00 Allocator failures
SMA.Transient.c_bytes               75855969615    281458.83 Bytes allocated
SMA.Transient.c_freed               75855969615    281458.83 Bytes freed
SMA.Transient.g_alloc                         0          .   Allocations outstanding
SMA.Transient.g_bytes                         0          .   Bytes outstanding
SMA.Transient.g_space                8589934592          .   Bytes available
MEMPOOL.req1.live                             5          .   In use
MEMPOOL.req1.pool                             9          .   In Pool
MEMPOOL.req1.sz_wanted                   262144          .   Size requested
MEMPOOL.req1.sz_actual                   262112          .   Size allocated
MEMPOOL.req1.allocs                      974654         3.62 Allocations
MEMPOOL.req1.frees                       974649         3.62 Frees
MEMPOOL.req1.recycle                     973872         3.61 Recycled from pool
MEMPOOL.req1.timeout                     147049         0.55 Timed out from pool
MEMPOOL.req1.toosmall                         0         0.00 Too small to recycle
MEMPOOL.req1.surplus                          0         0.00 Too many for pool
MEMPOOL.req1.randry                         782         0.00 Pool ran dry
MEMPOOL.sess1.live                            5          .   In use
MEMPOOL.sess1.pool                            9          .   In Pool
MEMPOOL.sess1.sz_wanted                     512          .   Size requested
MEMPOOL.sess1.sz_actual                     480          .   Size allocated
MEMPOOL.sess1.allocs                      92071         0.34 Allocations
MEMPOOL.sess1.frees                       92066         0.34 Frees
MEMPOOL.sess1.recycle                     91430         0.34 Recycled from pool
MEMPOOL.sess1.timeout                     65371         0.24 Timed out from pool
MEMPOOL.sess1.toosmall                        0         0.00 Too small to recycle
MEMPOOL.sess1.surplus                         0         0.00 Too many for pool
MEMPOOL.sess1.randry                        641         0.00 Pool ran dry
VBE.boot.lcn.happy                            0          .   Happy health probes
VBE.boot.lcn.bereq_hdrbytes          1817898758      6745.20 Request header bytes
VBE.boot.lcn.bereq_bodybytes            4279860        15.88 Request body bytes
VBE.boot.lcn.beresp_hdrbytes          948671121      3519.98 Response header bytes
VBE.boot.lcn.beresp_bodybytes      281279816120   1043671.17 Response body bytes
VBE.boot.lcn.pipe_hdrbytes                    0         0.00 Pipe request header bytes
VBE.boot.lcn.pipe_out                         0         0.00 Piped bytes to backend
VBE.boot.lcn.pipe_in                          0         0.00 Piped bytes from backend
VBE.boot.lcn.conn                             8          .   Concurrent connections to backend
VBE.boot.lcn.req                        1979191         7.34 Backend requests sent
VBE.boot.lcn.unhealthy                        0         0.00 Fetches not attempted due to backend being unhealthy
VBE.boot.lcn.busy                             0         0.00 Fetches not attempted due to backend being busy
VBE.boot.lcn.fail                             0         0.00 Connections failed
VBE.boot.lcn.fail_eacces                      0         0.00 Connections failed with EACCES or EPERM
VBE.boot.lcn.fail_eaddrnotavail               0         0.00 Connections failed with EADDRNOTAVAIL
VBE.boot.lcn.fail_econnrefused                0         0.00 Connections failed with ECONNREFUSED
VBE.boot.lcn.fail_enetunreach                 0         0.00 Connections failed with ENETUNREACH
VBE.boot.lcn.fail_etimedout                   0         0.00 Connections failed ETIMEDOUT
VBE.boot.lcn.fail_other                       0         0.00 Connections failed for other reason
VBE.boot.lcn.helddown                         0         0.00 Connection opens not attempted

Jemalloc is jemalloc-5.2.1-2.el8.x86_64

Initially I have installed Varnish from CentOS 8's AppStream repo, later I changed it to Varnish community one but no luck there either. I have to mention that during Varnish installation, it didn't install any jemalloc package as I could detect. I installed the upper one manually.

Besides myself, I'm starting to suspect that this setup could be buggy with CentOS 8.


Solution

  • Finally, it seems that jemalloc 5.2 (used on CentOS 8) still has some memory leaks, or at least it has a signifficantly larger memory foorprint that the version 3.6 that I was using on Centos 7.

    Among the most, this post explains it quite nicely: https://github.com/jemalloc/jemalloc/issues/1152

    Apparently there was some bug fix regarding this in v5.2.1 https://github.com/jemalloc/jemalloc/releases

    but it doesn't seem to be the case in Linux, or at least there's still quite a huge difference compared to v3.6x.

    So, what I did is that I precompiled jemalloc v4.4 on Centos 8, after installing Varnish from it's on community repository, and it seems to be OK so far.

    I should not that one that comes with Centos 8 Varnish module (from AppStream), seems to be missing jemalloc completely, and is ignoring the set storage limits with malloc.