Search code examples
javamultithreadingfinaljava-memory-modeljls

Java final fields: is "taint" behavior possible with the current JLS


I'm currently trying to understand this JLS section on final fields.

To understand the text in the JLS better I'm also reading The Java Memory Model by Jeremy Manson (one of creators of the JMM).

The paper contains the example that got me interested: if an object o with final fields is made visible to another thread t twice:

  • first "improperly" before o's constructor finishes
  • next "properly" after o's constructor finishes

then t can see semi-constructed o even when it is accessed only via a "properly" published path.

Here is the part from the paper:

Figure 7.3: Example of Simple Final Semantics

f1 is a final field; its default value is 0

Thread 1 Thread 2 Thread 3
o.f1 = 42;
p = o;
freeze o.f1;
q = o;

r1 = p;
i = r1.f1;
r2 = q;
if (r2 == r1)
    k = r2.f1;
r3 = q;
j = r3.f1;



We assume r1, r2 and r3 do not see the value null. i and k can be 0 or 42, and j must be 42.


Consider Figure 7.3. We will not start out with the complications of multiple writes to final fields; a freeze, for the moment, is simply what happens at the end of a constructor. Although r1, r2 and r3 can see the value null, we will not concern ourselves with that; that just leads to a null pointer exception.

...

What about the read of q.f1 in Thread 2? Is that guaranteed to see the correct value for the final field? A compiler could determine that p and q point to the same object, and therefore reuse the same value for both p.f1 and q.f1 for that thread. We want to allow the compiler to remove redundant reads of final fields wherever possible, so we allow k to see the value 0.

One way to conceptualize this is by thinking of an object being “tainted’ for a thread if that thread reads an incorrectly published reference to the object. If an object is tainted for a thread, the thread is never guaranteed to see the object’s correctly constructed final fields. More generally, if a thread t reads an incorrectly published reference to an object o, thread t forever sees a tainted version of o without any guarantees of seeing the correct value for the final fields of o.

I tried to find in the current JLS anything that explicitly allows or forbids such behavior, but all I found is that:

An object is considered to be completely initialized when its constructor finishes. A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields.

Is such behavior allowed in the current JLS?


Solution

  • Yes, such behavior is allowed.

    Turns out that a detailed explanation of this same case is available on the personal page of William Pugh (yet another JMM author): New presentation/description of the semantics of final fields.

    Short version:

    • section 17.5.1. Semantics of final Fields of JLS defines special rules for final fields.
      The rules basically lets us establish an additional happens-before relation between the initialization of a final field in a constructor and a read of the field in another thread, even if the object is published via a data race.
      This additional happens-before relation requires that every path from the field initialization to its read in another thread included a special chain of actions:

      w  ʰᵇ ► f  ʰᵇ ► a  ᵐᶜ ► r1  ᵈᶜ ► r2, where:
      • w is a write to the final field in a constructor
      • f is "freeze action", which happens when constructor exits
      • a is a publication of the object (e.g. saving it to a shared variable)
      • r₁ is a read of the object's address in a different thread
      • r₂ is a read of the final field in the same thread as r₁.
    • the code in the question has a path from o.f1 = 42 to k = r2.f1; which doesn't include the required freeze o.f action:

      o.f1 = 42  ʰᵇ ► { freeze o.f is missing }  ʰᵇ ► p = o  ᵐᶜ ► r1 = p  ᵈᶜ ► k = r2.f1

      As a result, o.f1 = 42 and k = r2.f1 are not ordered with happens-before ⇒ we have a data race and k = r2.f1 can read 0 or 42.

    A quote from New presentation/description of the semantics of final fields:

    In order to determine if a read of a final field is guaranteed to see the initialized value of that field, you must determine that there is no way to construct the partial orders  ᵐᶜ ► and  ᵈᶜ ► without providing the chain w  ʰᵇ f  ʰᵇ a  ᵐᶜ r₁  ᵈᶜ r₂ from the write of the field to the read of that field.

    ...

    The write in Thread 1 and read in Thread 2 of p are involved in a memory chain. The write in Thread 1 and read in Thread 2 of q are also involved in a memory chain. Both reads of f see the same variable. There can be a dereference chain from the reads of f to either the read of p or the read of q, because those reads see the same address. If the dereference chain is from the read of p, then there is no guarantee that r5 will see the value 42.

    Notice that for Thread 2, the deference chain orders r2 = p  ᵈᶜ r5 = r4.f, but does not order r4 = q  ᵈᶜ r5 = r4.f. This reflects the fact that the compiler is allowed to move any read of a final field of an object o to immediately after the the very first read of the address of o within that thread.