Search code examples
javajava-8garbage-collectionjvmconcurrent-mark-sweep

Could increase gc time short lived object that has references to old lived object?


I need some clarification about how minor gc collections behave. calling a() or calling b() in a long-lived application, if they could behave worstly when old space gets bigger

//an example instance lives all application life cycle 24x7
public class Example {

    private Object longLived = new Object(); 

    public void a(){
        var shortLived = new ShortLivedObject(longLived); // longLived now is attribute
        shortLived.doSomething();
    }


    public void b(){
       new ShortLivedObject().doSomething(new Object()); // actually now is shortlived
    }

}

Where does my doubt comes from? I found out that in an app in which the used tenured space gets bigger, there is an increase of minor gc pauses.

Making some tests I found out that if I force the jvm to use option a() and another jvm to use option b(), then the jvm with option b() has shorter pause duration time when the old space gets bigger but i can't figured out why.

gc cpu utilization time

I solved that issue in the app, using this property XX:ParGCCardsPerStrideChunk in 4096, but i want to know if situation which i described above can lead in increasing gctimes cause scanning in gccard tables is slower or something that i don't know or is not related at all.


Solution

  • Disclaimer: I am by far no GC expert, but lately getting into these details for fun.

    As I said in the comments, you are using a collector that is deprecated, no one supports it and no one wants to use it, switch to G1 or even better IMHO switch to Shenandoah : start from this simple thing first.

    I can only assume that you increased ParGCCardsPerStrideChunk from its default value and that probably helped by a few ms (though we have no proof of that). We also have no logs from GC, CPU activity, logs, etc; thus this is pretty complicated to answer.

    If indeed you have a big heap (tens of GB) and a big young space and you have enough GC Threads, setting that parameter to a bigger value might help indeed and it might even have to do with card table that you are mentioning. Read further why.

    CMS splits the heap into old space and young space, it could have chosen any other discriminator, but they chose age (just like G1). Why is that needed? To be able to scan and collect only partial regions of the heap (scanning it entirely is very expensive). young space is collected with a stop-the-world pause, so it better be small, otherwise you will not be happy; that is why also why you usually will see many more young collections compare to old ones.

    The only problem when you scan young space is: what happens if there are references from old space to objects from young space? Collecting those is obviously wrong, but scanning the entire old space to find out that answer would defeat the purpose of generational collections entirely. Thus: card table.

    This keeps track of reference from old space to young space references, so it knows what exactly is garbage or not. G1 uses a card table too, but also adds a RememberedSet (not going into the details here). In practice, RememberedSets turned out to be HUGE, that is why G1 became generational. (FYI: Shenandoah uses matrix instead of card table - making it not generational).

    So this huge intro, was to show that indeed increasing ParGCCardsPerStrideChunk might have helped. You are giving each GC thread more space to work on. The default value is 256 and card table is 512 bytes, that means

    256 * 512 = 128KB per stride of old generation
    

    If you for example have a heap of 32 GB how many hundreds of thousands of strides is that? Probably too many.

    Now, why you also bring reference counting into the discussion here? I have no idea.


    The examples that you have shown have different semantics and as such are kind of difficult to reason about; I'll still try to, though. You have to understand that reachability of Objects is just a graph that starts from some roots (called GC roots). Let's take this example first:

    public void b(){
       new ShortLivedObject().doSomething(new Object()); // actually now is shortlived
    }
    

    ShortLivedObject instance is "forgotten" as soon as doSomething method invocation is done and its scope is within the method only, as such no one can reach it. Thus the remaining part is about the parameter of doSomething : new Object. If doSomething does not do anything fishy with the parameter it got (making it reachable via a GC root graph), then after doSomething is done, it would become eligible for GC too. But even if doSomething makes new Object reachable it still means that ShortLivedObject instance is eligible for GC.

    As such, even if Example is reachable (means it can't be collected), ShortLivedObject and new Object() can potentially be collected. It can look like this:

                     new Object()
                          |
                         \ /
                   ShortLivedObject           
                          |
                         \ /
    GC Root -> ... - > Example
    

    You can see that once GC will scan Example instance, it might not scan ShortLivedObject at all (that is why garbage is identified as the opposite of live objects). So a GC algorithm will simply discard the entire graph and not scan it at all.


    The second example is different:

    public void a(){
        var shortLived = new ShortLivedObject(longLived);
        shortLived.doSomething();
    }
    

    The difference is that longLived here is an instance field and, as such, the graph will look a bit different:

                    ShortLivedObject
                          |
                         \ /
                      longLived         
                         / \
                          |
    GC Root -> ... - > Example
    

    It's obvious that ShortLivedObject can be collected in this case, but not longLived.

    What you have to understand that this does not matter at all, if Example instance can be collected; this graph will not be traversed and everything that Example uses can be collected.

    You should be able to understand now that using method a can retain a bit more garbage and can potentially move it to old space (when they become old enough) and can potentially make your young pauses be longer and indeed increasing ParGCCardsPerStrideChunk might help a bit; but this is highly speculative and you would need a pretty bad same pattern of allocations to happen for all of this to happen. Without logs, I highly doubt that.