Search code examples
ruby-on-railsrubymemory-leaksrubygemspuma

How to analyze ruby on rails memory leak?


I am dealing with a legacy system (Ruby 2.7.6), which suffers from a memory leak, that led the previous developers to make use of puma worker killer that overcomes the memory issue by restarting the process every 30 minutes. As traffic increases, we now need to increase the number of instances and decrease the 30 minutes kill rate to even 20 minutes.

We would like to investigate the source of this memory leak, which apparently originates from one of our many Gem dependencies (information given by a previous developer).

The system is on AWS (Elastic Beanstalk) but can also run on docker. Can anyone suggest a good tool and guide how to find the source for this memory leak? Thanks

** UPDATE: I made use of mini-profiler and I took some memory snapshot to see the influence of about 100 requests on the server, [BEFORE, DURING, AFTER]

judging by the outputs, it does not seem there is a memory leak in Ruby, but the memory usage did increase and stay up, although does not seem to be used by us...

BEFORE:

KiB Mem : 2007248 total, 628156 free, 766956 used, 612136 buff/cache KiB Swap: 2097148 total, 2049276 free, 47872 used. 1064852 avail Mem

Total allocated: 115227 bytes (1433 objects) Total retained: 21036 bytes (147 objects)

allocated memory by gem

 33121  activesupport-6.0.4.7
 21687  actionpack-6.0.4.7
 14484  activerecord-6.0.4.7
 12582  var/app
  9904  ipaddr
  6957  rack-2.2.4
  3512  actionview-6.0.4.7
  2680  mysql2-0.5.3
  1813  rack-mini-profiler-3.0.0
  1696  audited-5.0.2
  1552  concurrent-ruby-1.1.10

DURING:

KiB Mem : 2007248 total, 65068 free, 1800424 used, 141756 buff/cache KiB Swap: 2097148 total, 2047228 free, 49920 used.
58376 avail Mem

Total allocated: 225272583 bytes (942506 objects) Total retained: 1732241 bytes (12035 objects)

allocated memory by gem

106497060 maxmind-db-1.0.0
58308032 psych
38857594 user_agent_parser-2.7.0
4949108 activesupport-6.0.4.7
3967930 other
3229962 activerecord-6.0.4.7
2154670 rack-2.2.4
1467383 actionpack-6.0.4.7
1336204 activemodel-6.0.4.7

AFTER:

KiB Mem : 2007248 total, 73760 free, 1817688 used, 115800 buff/cache KiB Swap: 2097148 total, 2032636 free, 64512 used.
54448 avail Mem

Total allocated: 109563 bytes (1398 objects) Total retained: 14988 bytes (110 objects)

allocated memory by gem

 29745  activesupport-6.0.4.7
 21495  actionpack-6.0.4.7
 13452  activerecord-6.0.4.7
 12502  var/app
  9904  ipaddr
  7237  rack-2.2.4
  3128  actionview-6.0.4.7
  2488  mysql2-0.5.3
  1813  rack-mini-profiler-3.0.0
  1360  audited-5.0.2
  1360  concurrent-ruby-1.1.10

Where can the leak be then? is it Puma?


Solution

  • It seems from the statistics in the question that most objects get freed properly by the memory allocator.

    However - when you have a lot of repeated allocations, the system's malloc can sometimes (and often does) hold the memory without releasing it to the system (Ruby isn't aware of this memory that is considered "free").

    This is done for 2 main reasons:

    1. Most importantly: heap fragmentation (the allocator is unable to free the memory and unable to use parts of it for future allocations).

    2. The system's memory allocator knows it would probably need this memory again soon (that's in relation to the part of the memory that can be freed and doesn't suffer from fragmentation).

    This can be solved by trying to replace the system's memory allocator with an allocator that's tuned for your specific needs (i.e., jamalloc, such as suggested here and here and asked about here).

    You could also try to use gems that have a custom memory allocator when using C extensions (the iodine gem does that, but you could make other gems do it too).

    This approach should help mitigate the issue, but the fact is that some of your gems appear memory hungry... I mean...:

    • is the maxmind-db gem using 106,497,060 bytes (106MB) of memory or did it allocate that number of objects?

    • and why is psych so hungry? are there any roundtrips between data and YAML that could be skipped?

    • there seems to be a lot of user agent strings stored concurrently... (the user_agent_parser gem)... maybe you could make a cache of these strings instead of having a lot of duplicates. For example, you could make a Set of these strings and replace each String object with the object in the Set. This way equal strings would point at the same object (preventing some object duplication and freeing up some memory).

    Is it Puma?

    Probably not.

    Although I am the author of the iodine web server, I really love the work the Puma team did over the years and think it's a super solid server for what it offers. I really doubt the leak is from the server, but you can always switch and see what happens.


    Re: the difference between the Linux report and the Ruby profiler

    The difference is in the memory held by malloc - "free" memory that isn't returned to the system but Ruby doesn't know about.

    Ruby profilers test the memory Ruby allocated ("live" memory, if you will). They have access to the number of objects allocated and the memory held by those objects.

    The malloc library isn't part of Ruby. It's part of the C runtime library on top of which Ruby sits.

    There's memory allocated for the process by malloc that isn't used by Ruby. That memory is either waiting to be used (retained by malloc for future use) or waiting to be released back to the system (or fragmented and lost for the moment).

    That difference between what Ruby uses and what malloc holds should explain the difference between The Linux reporting and the Ruby profiling reporting.

    Some gems might be using their own custom made memory allocator (i.e., iodine does that). These behave the same as malloc in the sense that the memory they hold will not show up in the Ruby profiler (at least not completely).