Search code examples
javaweak-referencessoft-referencesphantom-reference

Rationale for Soft-/Weak-/PhantomReferences clearing references to objects which have reference to tracked object


The documentation for Soft-, Weak- and PhantomReferences all include a line simiar to the following (taken from PhantomReference):

At that time it will atomically clear all phantom references to that object and all phantom references to any other phantom-reachable objects from which that object is reachable.

The part which is confusing me is the one about the other phantom-reachable objects.

If I understand it correctly this describes this case:
Objects:

  • A
  • B

References:

  • ->: Strong reference
  • -P->: Phantom reference
-> A
-P-> B -> A

So for some reason the garbage collector has not determined yet that B is only phantom-reachable. Now if A becomes phantom-reachable and the garbage collector detects this, it is required (according to the doc quoted above) to also clear the reference to B.

Is there any reason why the documentation requires this? It appears if other vendors were to develop a JVM this would be rather a burden.


Solution

  • We first have to note, that this sentence has been copied from the documentation for soft and weak references to the documentation for phantom references for Java 9, to accommodate changes made in that version, but is not a good fit for phantom references, so the rationale behind it is better explained for soft and weak references.

    Suppose you have the following situation:

    (weak)→ A
    (weak)→ B (strong)→ A
    

    technically, both A and B are weakly reachable, but we can change this be invoking the get() method on either weak reference, to retrieve a strong reference to its referent.

    When we do this on the first weak reference, to retrieve a strong reference to A, the object B will stay weakly reachable, but when we do this to get a strong reference to B, the object A will also become strongly reachable, due to the strong reference from B to A.

    Therefore, we have the rule that if the weak reference to A gets cleared, the weak reference to B must be cleared to, as otherwise, it would be possible to retrieve a strong reference to A via B despite the weak reference to A has been cleared. And to be on the safe side, it must happen atomically, so there’s no possible race condition allowing to retrieve a reference to B between the clearance of the two references.

    As said, this is of lesser relevance for phantom references, as those do not allow to retrieve the reference, but there is no reason to treat them differently.

    The point here is, that this is not an actual burden, given how garbage collectors actually work. They have to traverse all live references, i.e. strongly reachable objects, and everything not encountered, is garbage per elimination. So when encountering a weak reference during a traversal, it won’t traverse the referent, but remember the reference object. Once it completed the traversal, it will run through all encountered reference objects and see whether the referent has been marked as reachable through a different path. If not, the reference object is cleared and linked for enqueuing.

    To address your example:

    (strong)→ A
    (weak)→ B (strong)→ A
    

    Here, B is weakly reachable regardless of the strong reference to A. When you eliminate the strong reference to A, B still is weakly reachable and may get enqueued. Formally, A is now weakly reachable, but the JVM will never detect that without detecting that B is weakly reachable too. The only way to detect that A is weakly reachable, would be by traversing the reference graph starting at the weakly reachable B. But no implementation does this. The garbage collector will simply clear the weak reference to B and that’s it.