Search code examples
javamemorygarbage-collection

Why does major GC take considerably more time than minor GC in Java?


Generally, GC in Java works by traversing object graph from Garbage Collection Roots.

So why does Major GC takes considerably more time than Minor GC? They both have to trigger stop-the-world event. They they both have to traverse whole object graph. Minor GC can't traverse only young generation objects, because what if the object is referenced by an object from an old gen? In that case we won't be able to know that is being referenced without traversing ALL objects.

Or is it only because of amount of moving bytes in memory, and not because of traversing object graph? And then, wouldn't old gen be less likely to have dead objects than young gen, resulting in overall moving less bytes while major GC-ing, then minor GC-ing? If so, is traversing object graph time negligible compared to moving bytes in memory, or it takes a good part of GC-ing?

I couldn't find that info on the internet. Everywhere I look, they just state that it is fast, or say that it is faster because "young gen in smaller". But size of generation does not really matter: we still should traverse whole object graph. Or does speed difference exist only because of moving bytes in memory?


Solution

  • I am assuming you have a typo and you are asking why minor GCs would be faster than major GCs.

    Minor GC can't traverse objects only young generation objects, because what if object is referenced by an object from an old gen. I that case we won't be able to know that is being referenced without traversing ALL objects.

    http://abiasforaction.net/understanding-jvm-garbage-collection-part-3/ is a pretty good source here, but to summarize: it is a good heuristic (called the weak generational hypothesis) that there are very few references from the old gen to the new gen. (To have a reference from the old gen to the new gen, you have to modify an existing object; you can't just create new objects, or they would be in the new gen.)

    Given that heuristic, the JVM can maintain a data structure which allows it to narrow down how much of the old generation it has to traverse, usually by a significant amount. This can be done simply by having every change to a reference in an old gen object update some data structure. This is where the savings come in; minor GCs only have to traverse those old gen objects that have had one of their references (e.g. an object field) updated to point to a young object.