Search code examples
garbage-collectiond

D Garbage Collector - estimation about how often and how long run?


I am learning D language and I like lot of it's features, but I am bit skeptical about the GC. I would like to give it chance, but first I would like to know:

  • How to estimate how often it will run?
  • How long it will run? Is it proportional to the amount of allocated memory, amount of managed objects, ...?

I am asking specifically about the GC from current D2 runtime.

I know GC can cause better performance in some cases, but what about this example: Imagine a game engine that allocates lot of memory in the beginning of game (hundreds of megabytes of complicated structures) but while the game is running it doesn't do nearly any allocations/deallocations. But some still happen (for example from string operations in GUI) - are those small things going to eventually trigger GC that will have to scan over all the allocated memory? As I understand it, even if I decide to manage the memory of most of the data myself, I have to register the ranges to GC if I want it to be able to hold any references to managed memory (like strings).

Of course, I can program first and profile later, but I would like to be able to make at least some estimations about the performance in advance. Better than resorting to some workarounds later. (Solutions like free lists are ugly workarounds in my opinion and can't be used everywhere anyway.)


Solution

  • I don't have exact answers, but here's what I can say:

    1) the most sure source would be the gc's source code: https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d (I'm pretty sure the very similar gcx.d file

    2) The gc can be expected to run when you do a gc allocation (well, if it thinks it needs to allocate a new block, then it will try to collect existing stuff first, but it is my experience that in practice, best to assume every single new can be a gc collection), at program termination, and nowhere else - if you don't allocate gc memory, the gc won't actually do anything. It doesn't stop your program to run at random.

    Though, it might seem random if you don't know where to look. Check the bottom of this page: http://dlang.org/garbage.html

    The one that most often gets me is the array literal: auto x = [1,2,3]; is a runtime gc allocation! There's a fair number of phobos functions that do gc allocations too, though not all of them. If a phobos function ever returns an array (including a string), odds are high that it allocates - if nothing else, the return value is liable to be a new block, unless you know you passed it a buffer to receive the data.

    That said, much of phobos is actually allocation free, and getting better with each release. I believe all of std.algorithm and the std.digest package is allocation free now, among others. So you don't have to throw it all out, just know which functions to avoid.

    If you've written a program and want to find hidden allocations, I would use a debugger. Set a breakpoint right before your main loop. Then set breakpoints at gc_malloc and gc_qalloc and continue. If it breaks, get the stack trace and now you know what to avoid later.

    If your main loop is gc allocation free, it will also be gc collection free.

    3) Will the gc scan over all the memory? Not necessarily. There's a noscan flag that the gc implementation (see the mark function in the source) that can skip blocks. In druntime/src/rt/lifetime.d, you can see this (called BlkAttr.NO_SCAN here) is set based on the TypeInfo. This isn't quite precise, but I'm pretty sure it is set correctly on things like big array allocations. Your game's bulk data assets should not be scanned.

    So, the time it takes would be proportional to the amount of memory it actually scans, which can be quite a bit less than the amount you've allocated.