Search code examples
memorymallocembedded-linuxswapfile

Detecting low memory situations in Embedded Linux


My team develops a complex multiprocess C++ based system running on Embedded Linux. Since there is no swap partition, a gradually growing memory leak can cause major trouble. (Let's assume for the sake of this discussion that all memory allocated in the system is filled with nonzero data.)

Now, as answered (tersely) here, when the operating system is out of RAM and has no swap, it discards clean pages. As far as I understand the only "clean" pages in this situation are those containing const data and currently/recently executing code from the Linux environment and particularly our executables and shared libraries, which may be harmlessly discarded and later reloaded from the filesystem as needed.

At first, the least recently used pages would be the first to go so this is hardly noticed but as more and more memory is allocated and the amount of wiggle room is reduced, code that is required more often gets swapped out then back in. The system starts to silently and invisibly thrash, but the only sign we see is the system becoming slower and less responsive, until eventually the kernel's oom-killer steps in and does its thing.

This situation doesn't necessarily require a memory leak to happen; it can happen simply because the natural memory requirements of our software exceeds available RAM. Such a situation is even harder to catch because the system won't crash, and the performance hit caused by the thrashing is not always immediately noticeable and can be confused with other reasons for bad performance (such as an inefficient algorithm).

I'm looking for a way to catch and flag this issue unambiguously before performance starts getting hit; ideally I'd like to monitor the amount of clean page discards that occur, hopefully without requiring a specially rebuilt kernel. Then I can establish some threshold beyond which an error will be raised. Of course any better ideas will be appreciated too.

I've tried other solutions such as monitoring process memory usage with top, or having processes self-police themselves with mallinfo(3) but still this doesn't catch all situations or clearly answer the question of what the overall memory usage status is. Another thing I've looked at is the "free" column in the output of free but that can display a low value whether or not thrashing is taking place.


Solution

  • Alex's answer pointed me in the right direction by mentioning page faults, but the more specific answer is major page faults. From the perf_event_open(2) man page:

     PERF_COUNT_SW_PAGE_FAULTS_MAJ
    

    This counts the number of major page faults. These required disk I/O to handle.

    So while these are not the clean page discards I asked for, they are their corollary - they indicate when something that was previously swapped out, gets swapped back in from disk. In a swapless system the only thing that can get swapped in from disk are clean pages. In my tests I've found that these faults are normally few and far between but suddenly spike when memory is low (on my system it's something like 3 or more faults per second for over 5 consecutive seconds), and this indication is consistent with the system becoming slower and less responsive.

    As for actually querying this statistic, this was answered in Measure page faults from a c program but I recommend starting from the code example from the bottom of the perf_event_open(2) man page (see link above) with this change:

    pe.type = PERF_TYPE_SOFTWARE;
    pe.config = PERF_COUNT_SW_PAGE_FAULTS_MAJ;
    

    Assuming you want to get a system-wide statistic and not just pertaining to the current process, change the actual open line to:

    fd = perf_event_open(&pe, -1, cpu, -1, 0);
    

    The cpu argument here is tricky. On a single core single CPU system just set it to 0. Otherwise you will have to open a separate performance counter (with a separate fd) for each core, read them all and sum up their results. For a thread explaining why see here. It is easiest to get the number of cores using get_nprocs(3).