java multithreading volatile cpu-cache memory-barriers

Is this understanding correct for these code about java volatile and reordering?

According to this reorder rules

if I have code like this

volatile int a = 0;

boolean b = false;

foo1(){ a= 10; b = true;}

foo2(){if(b) {assert a==10;}}

Make Thread A to run foo1 and Thread b to run foo2, since a= 10 is a volatile store and b = true is a normal store, then these two statements could possible be reordered, which means in Thread B may have b = true while a!=10? Is that correct?

Added:

Thanks for your answers!
I am just starting to learn about java multi-threading and have been troubled with keyword volatile a lot.

Many tutorial talk about the visibility of volatile field, just like "volatile field becomes visible to all readers (other threads in particular) after a write operation completes on it". I have doubt about how could a completed write on field being invisible to other Threads(or CPUS)?

As my understanding, a completed write means you have successfully written the filed back to cache, and according to the MESI, all others thread should have an Invalid cache line if this filed have been cached by them. One exception ( Since I am not very familiar with the hardcore, this is just a conjecture )is that maybe the result will be written back to the register instead of cache and I do not know whether there is some protocol to keep consistency in this situation or the volatile make it not to write to register in java.

In some situation that looks like "invisible" happens examples:

    A=0,B=0; 
    thread1{A=1; B=2;}  
    thread2{if(B==2) {A may be 0 here}}

suppose the compiler did not reorder it, what makes we see in thread2 is due to the store buffer, and I do not think a write operation in store buffer means a completed write. Since the store buffer and invalidate queue strategy, which make the write on variable A looks like invisible but in fact the write operation has not finished while thread2 read A. Even we make field B volatile, while we set a write operation on field B to the store buffer with memory barriers, thread 2 can read the b value with 0 and finish. As for me, the volatile looks like is not about the visibility of the filed it declared, but more like an edge to make sure that all the writes happens before volatile field write in ThreadA is visible to all operations after volatile field read( volatile read happens after volatile field write in ThreadA has completed ) in another ThreadB.

By the way, since I am not an native speakers, I have seen may tutorials with my mother language(also some English tutorials) say that volatile will instruct JVM threads to read the value of volatile variable from main memory and do not cache it locally, and I do not think that is true. Am I right?

Anyway, Thanks for your answers, since not a native speakers, I hope I have made my expression clearly.

Solution

Answer to your addition.

Many tutorial talk about the visibility of volatile field, just like "volatile field becomes visible to all readers (other threads in particular) after a write operation completes on it". I have doubt about how could a completed write on field being invisible to other Threads(or CPUS)?

The compiler might mess up code.

e.g.

boolean stop;

void run(){
  while(!stop)println();
}

first optimization

void run(){
   boolean r1=stop;
   while(!r1)println();
}

second optimization

void run(){
   boolean r1=stop;
   if(!r1)return;
   while(true) println();
}

So now it is obvious this loop will never stop because effectively the new value to stop will never been seen. For store you can do something similar that could indefinitely postpone it.

As my understanding, a completed write means you have successfully written the filed back to cache, and according to the MESI, all others thread should have an Invalid cache line if this filed have been cached by them.

Correct. This is normally called 'globally visible' or 'globally performed'.

One exception ( Since I am not very familiar with the hardcore, this is just a conjecture )is that maybe the result will be written back to the register instead of cache and I do not know whether there is some protocol to keep consistency in this situation or the volatile make it not to write to register in java.

All modern processors are load/store architectures (even X86 after uops conversion) meaning that there are explicit load and store instructions that transfer data between registers and memory and regular instructions like add/sub can only work with registers. So a register needs to be used anyway. The key part is that the compiler should respect the loads/stores of the source code and limit optimizations.

suppose the compiler did not reorder it, what makes we see in thread2 is due to the store buffer, and I do not think a write operation in store buffer means a completed write. Since the store buffer and invalidate queue strategy, which make the write on variable A looks like invisible but in fact the write operation has not finished while thread2 read A.

On the X86 the order of the stores in the store buffer are consistent with program order and will commit to the cache in program order. But there are architectures where stores from the store buffer can commit to the cache out of order e.g. due to:

write coalescing
allowing stores to commit to cache as soon as the cache line is returned in the right state no matter if an earlier still is still waiting.
sharing the store buffer with a subset of the CPUs.

Store buffers can be a source of reordering; but also out of order and speculative execution can be a source.

Apart from the stores, reordering loads can also lead to observing stores out of order. On the X86 loads can't be reordered, but on the ARM it is allowed. And of course the JIT can mess things up as well.

Even we make field B volatile, while we set a write operation on field B to the store buffer with memory barriers, thread 2 can read the b value with 0 and finish.

It is important to realize that the JMM is based on sequential consistency; so even though it is a relaxed memory model (separation of plain loads and stores vs synchronization actions like volatile load/store lock/unlock) if a program has no data races, it will only produce sequential consistent executions. For sequential consistency the real time order doesn't need to be respected. So it is perfectly fine for a load/store to be skewed as long as:

there memory order is a total order over all loads/stores
the memory order is consistent with the program order
a load sees the most recent write before it in the memory order.

As for me, the volatile looks like is not about the visibility of the filed it declared, but more like an edge to make sure that all the writes happens before volatile field write in ThreadA is visible to all operations after volatile field read( volatile read happens after volatile field write in ThreadA has completed ) in another ThreadB.

You are on the right path.

Example.

int a=0
volatile int b=;

thread1(){
   1:a=1
   2:b=1
}

thread2(){
   3:r1=b
   4:r2=a
}

In this case there is a happens before edge between 1-2 (program order). If r1=1, then there is happens before edge between 2-3 (volatile variable) and a happens before edge between 3-4 (program order).

Because the happens before relation is transitive, there is a happens before edge between 1-4. So r2 must be 1.

volatile takes care of the following:

Visibility: needs to make sure the load/store doesn't get optimized out.
That is load/store is atomic. So a load/store should not be seen partially.
And most importantly, it needs to make sure that the order between 1-2 and 3-4 is preserved.

By the way, since I am not an native speakers, I have seen may tutorials with my mother language(also some English tutorials) say that volatile will instruct JVM threads to read the value of volatile variable from main memory and do not cache it locally, and I do not think that is true.

You are completely right. This is a very common misconception. Caches are the source of truth since they are always coherent. If every write needs to go to main memory, programs would become extremely slow. Memory is just a spill bucket for whatever doesn't fit in cache and can be completely incoherent with the cache. Plain/volatile loads/stores are stored in the cache. It is possible to bypass the cache for special situations like MMIO or when using e.g. SIMD instructions but it isn't relevant for these examples.

Anyway, Thanks for your answers, since not a native speakers, I hope I have made my expression clearly.

Most people here are not a native speaker (I'm certainly not). Your English is good enough and you show a lot of promise.