Search code examples
javamultithreadingthread-safetyrace-condition

Is it OK to read a variable that could potentially be written at the same time?


Might sound a little silly but I'm not proficient in Java so wanted to make sure:

If there are two code points

I:

if (_myVar == null) 
{
    return; 
}

II:

synchronized (_myLock) 
{
    _myVar = MyVarFactory.create(/* real params */)
}

Edit: Assuming that this _myVar is a complex object (i.e. not boolean, int, or long) but a fully fledged java class that has some parent classes etc. .

Assuming that I and II can run on separate threads at the same time, I think that in C++ this would be a "data race", however I am not sure about the situation in Java.


Solution

  • TL;DR: No, not okay.

    Explanation:

    The relevant documentation is the Java Memory Model (JMM).

    The JMM gives the freedom to the JVM to make a local cached copy of every field on all objects for each individual thread.

    Then, it hands each thread a coin. Anytime the thread reads a field or writes a field, it flips this coin. On heads, it uses its local cache. On tails, it updates both its local cache as well as the 'real' copy.

    Furthermore, the coin is evil. It is not actually random, but it is unreliable. It may flip tails every time today, every time on the test machine, and every time during the first week of the beta. And then just when you're giving a demo to that important potential customer it starts flipping heads on you, reliably, all day, every time. Just.. all of a sudden.

    The name of the game is simple: If the behaviour of your program depends on the result of the evil coin flip, you lose.

    Thus, either write code that doesn't care (hard), or write code that suppresses the flips (easier).

    In general, the easiest thing to do is to never have any fields that you concurrently write to and read from. This sounds impossible but is, in fact, quite easy: Top-down frameworks like fork join do all communications via the stack (so, method parameter passing and method return values), and there is of course that old, tried, and true trick: Do all comms via a channel that has excellent support for concurrent operations, such as a relational database like postgres, or a message queue like rabbitmq.

    If you must use the same field from multiple threads in a concurrent fashion, the only way to ensure the evil coin is not flipped is to establish so-called 'Happens-Before/Happens-After' relationships (this is the official terminology as used in the JMM): There are certain specific ways to set up a relationship such that the JMM officially blesses 2 lines of code: That line will definitely 'happen after' that line (which means: The line that 'happens after' will definitely observe the changes that were caused by the line that 'happens before'). Without HBHA, evil coin flip occurs and you may or may not see the change depending on the phase of the moon.

    The list of HBHA causation is lengthy, but the common ways:

    • The natural: 2 bits of code running in the same thread have a natural HBHA relationship. The JVM/CPU is actually free to re-order code and run things simultaneously if it wants to, but the JVM guarantees that whatever any code observes is as if code within a single thread runs strictly sequentially.
    • Starting threads: thread.start() is guaranteed to happen-before the first line of code within that thread.
    • synchronized: If a thread exits a synchronized block, then that happens-before any other thread entering a synchronized block that is synchronizing on the same object reference.
    • volatile: Reads/writes to volatile fields establish an arbitrary order, but it is reliable, and sets up HBHA.

    In your code example, there is absolutely no HBHA going on, as I assume that the first snippet runs in one thread and the second snippet runs in another. Yes, the second snippet uses synchronized, but the first does not, and synchronized can only establish HBHA with other synchronized blocks (and only if they are synchronizing on the exact same object). Thus, you have no HBHA.

    Therefore, the JMM gives the JVM the freedom to run your snippets such that you do not observe that update done by the second snippet (where _myVar is set up to some instance), even if it CAN observe other stuff that the second thread did change.

    SOLUTION: Set up HBHA; use either an AtomicReference which does it for you, or toss a synchronized(_myLock) around the first snippet, or forget this and use a db or rabbitmq or fork/join or some other framework.

    NB: There is pretty much no way to write tests that confirm that evil coin flips are occurring. You should take the advice to look into obviating the need to talk about sharing mutating fields between threads entirely with e.g. fork/join, message queues, or databases seriously as a consequence: Multithreaded code that shares fields has a tendency to be riddled with bugs that no tests can catch.