Search code examples
linuxhaskellmemory-leaksprofilingghc

What memory leaks can occur outside the view of GHC's heap profiler


I have a program that exhibits the behavior of a memory leak. It gradually takes up all of the systems memory until it fills all swap space and then the operating system kills it. This happens once every several days.

I have extensively profiled the heap in a manner of ways (-hy, -hm, -hc) and tried limiting heap size (-M128M) tweaked the number of generations (-G1) but no matter what I do the heap size appears constant-ish and low always (measured in kB not MB or GB). Yet when I observe the program in htop, its resident memory steadily climbs.

What this indicates to me is that the memory leak is coming from somewhere besides the GHC heap. My program makes use of dependencies, specifically Haskell's yaml library which wraps the C library libyaml, it is possible that the leak is in the number of foreign pointers it has to objects allocated by libyaml.

My question is threefold:

  1. What places besides the GHC heap can memory leak from in a Haskell program?
  2. What tools can I use to track these down?
  3. What changes to my source code need to be made to avoid these types of leaks, as they seem to differ from the more commonly experienced space leaks in Haskell?

Solution

  • This certainly sounds like foreign pointers aren't being finalized properly. There are several possible reasons for this:

    1. The underlying C library doesn't free memory properly.
    2. The Haskell library doesn't set up finalization properly.
    3. The ForeignPtr objects aren't being freed.

    I think there's actually a decent chance that it's option 3. If the RTS consistently finds enough memory in the first GC generation, then it just won't bother running a major collection. Fortunately, this is the easiest to diagnose. Just have your program run System.Memory.performGC every so often. If that fixes it, you've found the bug and can tweak just how often you want to do that.

    Another possible issue is that you could have foreign pointers lying around in long-lived thunks or other closures. Make sure you don't.


    One particularly strong possibility when working with a wrapped C library is that the wrapper functions will return ByteStrings whose underlying arrays were allocated by C code. So any ByteStrings you get back from yaml could potentially be off-heap.