Search code examples
javaandroidmultithreadingvolatile

Multithreaded access and variable cache of threads


I could find the answer if I read a complete chapter/book about multithreading, but I'd like a quicker answer. (I know this stackoverflow question is similar, but not sufficiently.)

Assume there is this class:

public class TestClass {
   private int someValue;

   public int getSomeValue() { return someValue; }
   public void setSomeValue(int value) {  someValue = value; }
}

There are two threads (A and B) that access the instance of this class. Consider the following sequence:

  1. A: getSomeValue()
  2. B: setSomeValue()
  3. A: getSomeValue()

If I'm right, someValue must be volatile, otherwise the 3rd step might not return the up-to-date value (because A may have a cached value). Is this correct?

Second scenario:

  1. B: setSomeValue()
  2. A: getSomeValue()

In this case, A will always get the correct value, because this is its first access so he can't have a cached value yet. Is this right?

If a class is accessed only in the second way, there is no need for volatile/synchronization, or is it?

Note that this example was simplified, and actually I'm wondering about particular member variables and methods in a complex class, and not about whole classes (i.e. which variables should be volatile or have synced access). The main point is: if more threads access certain data, is synchronized access needed by all means, or does it depend on the way (e.g. order) they access it?


After reading the comments, I try to present the source of my confusion with another example:

  1. From UI thread: threadA.start()
  2. threadA calls getSomeValue(), and informs the UI thread
  3. UI thread gets the message (in its message queue), so it calls: threadB.start()
  4. threadB calls setSomeValue(), and informs the UI thread
  5. UI thread gets the message, and informs threadA (in some way, e.g. message queue)
  6. threadA calls getSomeValue()

This is a totally synchronized structure, but why does this imply that threadA will get the most up-to-date value in step 6? (if someValue is not volatile, or not put into a monitor when accessed from anywhere)


Solution

  • The issue is that java is simply a specification. There are many JVM implementations and examples of physical operating environments. On any given combination an an action may be safe or unsafe. For instance On single processor systems the volatile keyword in your example is probably completely unnecessary. Since the writers of the memory and language specifications can't reasonably account for possible sets of operating conditions, they choose to white-list certain patterns that are guaranteed to work on all compliant implementations. Adhering to to these guidelines ensures both that your code will work on your target system and that it will be reasonably portable.

    In this case "caching" typically refers to activity that is going on at the hardware level. There are certain events that occur in java that cause cores on a multi processor systems to "Synchronize" their caches. Accesses to volatile variables are an example of this, synchronized blocks are another. Imagine a scenario where these two threads X and Y are scheduled to run on different processors.

    X starts and is scheduled on proc 1
    y starts and is scheduled on proc 2
    
    .. now you have two threads executing simultaneously
    to speed things up the processors check local caches
    before going to main memory because its expensive.
    
    x calls setSomeValue('x-value') //assuming proc 1's cache is empty the cache is set
                                    //this value is dropped on the bus to be flushed
                                    //to main memory
                                    //now all get's will retrieve from cache instead
                                    //of engaging the memory bus to go to main memory 
    y calls setSomeValue('y-value') //same thing happens for proc 2
    
    //Now in this situation depending on to order in which things are scheduled and
    //what thread you are calling from calls to getSomeValue() may return 'x-value' or
    //'y-value. The results are completely unpredictable.  
    

    The point is that volatile(on compliant implementations) ensures that ordered writes will always be flushed to main memory and that other processor's caches will be flagged as 'dirty' before the next access regardless of the thread from which that access occurs.

    disclaimer: volatile DOES NOT LOCK. This is important especially in the following case:

    volatile int counter;
    
    public incrementSomeValue(){
        counter++; // Bad thread juju - this is at least three instructions 
                   // read - increment - write             
                   // there is no guarantee that this operation is atomic
    }
    

    this could be relevant to your question if your intent is that setSomeValue must always be called before getSomeValue

    If the intent is that getSomeValue() must always reflect the most recent call to setSomeValue() then this is a good place for the use of the volatile keyword. Just remember that without it there is no guarantee that getSomeValue() will reflect to most recent call to setSomeValue() even if setSomeValue() was scheduled first.