Search code examples
javagarbage-collectionjvmshenandoah

Shenandoah self healing barriers


The title pretty much says it all - what are these self healing barriers and why are they important in Shenandoah 2.0?


Solution

  • This explanation will piggy-back on the first part and the second part of some answers I tried to put around Shenandoah 2.0.

    To really answer this question we need to look at how the load reference barrier is implemented and how a GC cycle acts, in general.

    When a certain GC cycle is triggered, it first chooses the regions with the most garbage; i.e.: objects that are in the collection set are very few (this will matter in the future). The simplest way to understand this topic is via an example. Suppose this is a scheme that now exists in a certain region:

    refA refB            
        |               
    ---------
    |  mark |                    
    ---------          
    | i = 0 |          
    | j = 0 |          
    --------- 
    

    There is an object that exists in the region and there are two references pointing to it : refA and refB. GC kicks in and this region is chosen to be garbage collected. At the same time there are active threads in the application that try to access this Object via refA and refB. Since this object is alive at some point it needs to be evacuated to a new region (part of the mark-compact phase).

    So: GC is active and, at the same time, we read via refA/refB. When we do this reading we step on the load-reference-barrier, implemented here. Notice how internally it has some "filters" (via a bunch of if/else statements). Specifically:

    • it checks if "evacuation is currently in progress". This is done via a thread local flag that is set when evacuation first starts. Let's suppose the answer to this is : yes.

    • it checks if the object that we are currently operating on is in the "collection-set". This means it is currently marked as alive. Let's suppose this is "yes" also.

    • the last check is to find out if this object was already "copied" to a different region (it was evacuated). Let's suppose the answer to this is "no", i.e. : obj == fwd.

    At this point in time, a few things happen. First a copy is created and mark becomes forwardee

        refA refB            
            |               
     --------------      ---------
     |  forwardee | ---- | mark |            
     --------------      ---------    
        | i = 0 |        | i = 0 |  
        | j = 0 |        | j = 0 |  
        ---------        ---------
    

    Only later in the code, would refA and refB be updated to point to the new (copied) object. But that means an interesting thing. It means that until refA and refB are actually made to point to the new object, the object that they currently point, is in the "collection set". So, if GC is active and even if the forwardee has been established, the load-reference-barrier still needs to do some work.

    So the very smart people behind Shenandoah said this : why not update the references there, immediately after the forwardee has been established (or when the forwardee is already known for other references)? And this is exactly what they did.

    Let's suppose we get back to our initial drawing:

    refA refB            
        |               
    ---------
    |  mark |                    
    ---------          
    | i = 0 |          
    | j = 0 |          
    --------- 
    

    And again, we "enable" all of the filter:

    • there is a Thread that reads via refA

    • GC is active

    • the object behind refA and refB is alive.

    This is what will happen with "self healing barriers":

           refB             refA
            |                |
     --------------      ---------
     |  forwardee | ---- | mark |            
     --------------      ---------    
        | i = 0 |        | i = 0 |  
        | j = 0 |        | j = 0 |  
        ---------        ---------
    

    The difference is obvious: refA was moved to point to the new Object via CAS, on the spot. If there is going to be a read again via refA (GC is still active), this will result in a much faster load-reference-barrier execution. Why? because refA points to an object that is not in the "collection set".

    But this also means that if we read via refB and see that fwd != obj - the code can do the same trick and update the refB in place, at the time the first read happened via refB.

    This improves performance according to the people familiar with the matter, and I trust them.