java multithreading jvm java-memory-model

Do I need to use volatile?

Consider the following code:

public class MyDataStructure {

    int size;
    final ReentrantLock lock = new ReentrantLock();

    public void update() {
        lock.lock();
        try {
            // do coll stuff
            size++;
        } finally {
            lock.unlock();
        }
    }

    public int size() {
        return size;
    }
}

As far as I understand, ReentrantLock imposes an happens-before relationship so size++ needs to affect the main memory and the cache (of other threads) with the new value. Hence, no need to use volatile for size. Also, for the same reason the method size() can simply return size.

Is my understanding correct?

Solution

No.

Other threads may see a stale (old) value of size.

When other threads execute size() there is nothing there instructing it to ensure it's not reading an old value, such as a locally cached value in another cpu cache etc. Also worth mentioning is hoisting by the jvm.

An example, if you have a loop calling size() in one thread, it may never exit if update() is not called in the loop (or size changed directly), and only called/changed from other threads.

while (size() == 0) {
   ...
}

The jit compiler, (at least the modern part what used to be called the c2/server compiler) could happily optimize away (hoist) the size variable checks as it sees that it never changes in the loop.

Update about alternatives:

volatile could be helpful if there is only one thread that will ever write to the variable size, including calling update(). Otherwise it wouldn't be protected as size++ is both first reading and then updating (writing) the variable so two threads could still read a fresh copy "at the same time" with the same value and both add +1 but instead of +2 it could then be a total of +1. Even if there is only one writer I would argue against it as this could change in the future, and is a fairly subtle thing to have in the code so a future developer (including self) would stand a great risk of missing this.

So one option could be to add lock to the size() function as well. Possibly nice with the extra features of locks, or even use readwrite locks that would allow many readers but only one writer etc. It's not as readable (well rather verbose) as the other alternatives. Possibly nice with the future virtual threads too if it was a costly operation (it isn't in this case though).

Another option is just the traditional simple way of adding synchronized to both methods, or a synchronized (this) { ... } block, that would beyond doubt provide guarantees about exclusion and memory visibility. Only real drawback: This implies using that object instance as the mutex/monitor so might not be granular enough if other unrelated variables might need protection as well or others having a references to this object using it. You could then add special monitor/mutex objects for each. Pattern, a field each:

private final Object sizeMutex = new Object();

... and synchronize on those synchronized (sizeMutex) { ... } when needed, but then it starts to be more and more verbose/complex but still fairly obvious and understandable. Biggest risk would be the possibility of introducing deadlock. Most likely not even needed to be that granular, but good to think about.

Simplest option in this particular case though is to use the AtomicInteger or related classes for other primitives or even entire objects as others recommend in the comments.

private final AtomicInteger size = new AtomicInteger();

size() {
   return size.get();
}

update() {
   size.incrementAndGet();
}