Search code examples
erlangets

Erlang ETS memory fragmentation


I have an erlang cluster where erlang:memory() 'total' is between 2-2.5GB from idle to busy time, day in day out. ets memory usage is around 440M and stays around there no matter what. The data within ets is heavily transient, completely changes throughout the day. Tomorrows data is guaranteed to have no commonality to today's.

Linux top says beam is using like 10 gigabytes. free -m 'used' agrees with that (the machine really only runs beam). The overall memory usage of the system grows regularly, like 1% per day on 16GB systems. There is some variance across nodes, but not by alot, and OS 'used' memory is always several times more than erlang:memory() total.

erlang:system_info({allocator, ets_alloc}) shows 20 allocators. Most have data that looks something like this (full output of command is here):

    {mbcs_pool,[{blocks,2054},
   {blocks_size,742672},
   {carriers,10},
   {carriers_size,17825792}]},

1) Does this mean that 742K bytes (words?) of memory are actually taking 17M of OS memory? 2) As this post suggests, should we add '+MEas bf' to the VM args, in order to reduce overhead? 3) What else can I do to avoid actually running out of memory?

This is R17.5 but we will be migrating to R19.3 in next deployment (this week). We don't have recon in the current deployment but will be adding it in the next deployment. Also, can't imagine this matters, but beam is running inside an alpine container.


Solution

  • In case someone else runs into this later: this was not actually leaked memory.

    The default memory allocator strategy of erlang may not be optimal for your use, depending what you do, and depending on how erlang is configured to allocate blocks. Turns out, in some cases, "free" memory from erlang point of view won't necessarily be immediately released to the OS due to allocator fragmentation.

    It's somewhat explained here: http://erlang.org/doc/man/erts_alloc.html

    The default allocator strategy for the version of erlang we used at the time is aoffcbf (address order first fit carrier best fit). In our case, this resulted in very high memory fragmentation (10+GB overhead worth). When troubleshooting these things, erlang:system_info(allocator) and erlang:system_info({allocator, Alloc}) are your friend. Changing to aobff (address order best fit) resulted in much more efficient memory usage. In truth, as long as the machine didn't run out of physical memory, it wouldn't matter, but for us, we were getting dangerously close to the physical limit. And you do not want to start paging. With aobff, we never passed 4GB, even after the node being up 18 months. With the aoffcbf we would pass 10GB in a few weeks.

    As always, YMMV, as it all depends what type, size, etc.. of blocks are allocated, and how long they live.