I am writting this post in connection to Deep understanding of volatile in Java
public class Main {
private int x;
private volatile int g;
public void actor1(){
x = 1;
g = 1;
}
public void actor2(){
put_on_screen_without_sync(g);
put_on_screen_without_sync(x);
}
}
Now, I am analyzing what JIT generated for above piece of code. From our discussion in my previous post we know that output 1, 0
is impossible because:
write to volatile v
causes that every action a
preceeding v
causes that a
will be visible (will be flushed to memory) before v
will be visible.
.................(I removed not important body of method).....
0x00007f42307d9d5e: c7460c01000000 (1) mov dword ptr [rsi+0ch],1h
;*putfield x
; - package.Main::actor1@2 (line 14)
0x00007f42307d9d65: bf01000000 (2) mov edi,1h
0x00007f42307d9d6a: 897e10 (3) mov dword ptr [rsi+10h],edi
0x00007f42307d9d6d: f083042400 (4) lock add dword ptr [rsp],0h
;*putfield g
; - package.Main::actor1@7 (line 15)
0x00007f42307d9d72: 4883c430 add rsp,30h
0x00007f42307d9d76: 5d pop rbp
0x00007f42307d9d77: 850583535116 test dword ptr [7f4246cef100h],eax
; {poll_return}
0x00007f42307d9d7d: c3 ret
Do I understand correctly that it works because x86 cannot make StoreStore
reordering? If it could it would require additional memory barrier, yes?
EDITED AFTER EXCELLENT @Eugene's answer:
int tmp = i; // volatile load // [LoadStore] // [LoadLoad]
Here, I see what do you mean- it is clear: every action below (after)
volatile read (int tmp = i
) doesn't be reordered.
// [StoreLoad] -- this one int tmp = i; // volatile load // [LoadStore] // [LoadLoad]
Here, you put one more barrier. It ensures us that no action will be reordered with int tmp = i
. But, why it is important? Why I have doubts? From what I know volatile load
guarantees:
Every action after volatile load won't be reordered before volatile load is visible.
I see you write:
There needs to be a sequential consistency
But, I cannot see why sequential consistency is required.
A couple of things, first will be flushed to memory
- that's pretty erroneous. It's almost never a flush to main memory - it usually drains the StoreBuffer to L1
and it's up to the cache coherency protocol to sync the data between all caches, but if it's easier for you to understand this concept in these terms, it's fine - just know that is slightly different and faster.
It's a good question of why the [StoreLoad]
is there indeed, maybe this will clear up things a bit. volatile
is indeed all about fences and here is an example of what barriers would be inserted in case of some volatile operations. For example we have a volatile load
:
// i is some shared volatile field
int tmp = i; // volatile load of "i"
// [LoadLoad|LoadStore]
Notice the two barriers here LoadStore
and LoadLoad
; in plain english it means that any Load
and Store
that come after a volatile load/read
can not "move up" the barrier, they can not be re-ordered "above" that volatile load.
And here is the example for volatile store
.
// "i" is a shared volatile variable
// [StoreStore|LoadStore]
i = tmp; // volatile store
It means that any Load
and Store
can not go "below" the load store itself.
This basically builds the happens-before relationship, volatile load
being the acquiring load and volatile store
being the releasing store (this also has to do with how Store
and Load
cpu buffers are implemented, but it's pretty much out of the scope of the question).
If you think about it, it makes perfect sense about things that we know about volatile
in general; it says that once a volatile store has been observed by a volatile load, everything prior to a volatile store
will be observed also and this is on-par with memory barriers. It makes sense now that when a volatile store takes place, everything above it can not go beyond it, and once a volatile load happens, everything below it can not go above it, otherwise this happens-before would be broken.
But that's not it, there's more. There needs to be sequential consistency, that is why any sane implementation will guarantee that volatiles themselves are not re-ordered, thus two more fences are inserted:
// any store of some other volatile
// can not be reordered with this volatile load
// [StoreLoad] -- this one
int tmp = i; // volatile load of a shared variable "i"
// [LoadStore|LoadLoad]
And one more here:
// [StoreStore|LoadStore]
i = tmp; // volatile store
// [StoreLoad] -- and this one
Now, it turns out that on x86
3 out of 4 memory barriers are free - since it is a strong memory model
. The only one that needs to be implemented is StoreLoad
. On other CPU's, like ARM
for example, lwsycn
is one instruction used - but I don't know much about them.
Usually an mfence
is a good option for StoreLoad
on x86
, but the same thing is guaranteed via lock add
(AFAIK in a cheaper way), that is why you see it there. Basically that is the StoreLoad
barrier. And yes - you are right in your last sentence, for a weaker memory model - the StoreStore
barrier would be required. On a side-note that is what is used when you safely publish a reference via final
fields inside a constructor. Upon exiting the constructor there are two fences inserted: LoadStore
and StoreStore
.
Take all this with a grain of salt - a JVM is free to ignore these as long as it does not break any rules: Aleksey Shipilev has a great talk about this.
EDIT
Suppose you have this case :
[StoreStore|LoadStore]
int x = 4; // volatile store of a shared "x" variable
int y = 3; // non-volatile store of shared variable "y"
int z = x; // volatile load
[LoadLoad|LoadStore]
Basically there is no barrier that would prevent the volatile store
to be re-ordered with the volatile load
(i.e.: the volatile load would be performed first) and that would cause problems obviously; sequential consistency thus being violated.
You are sort of missing the point here btw (if I am not mistaken) via Every action after volatile load won't be reordered before volatile load is visible
. Re-ordering is not possible with the volatile itself - other operations are free to be re-ordered. Let me give you an example:
int tmp = i; // volatile load of a shared variable "i"
// [LoadStore|LoadLoad]
int x = 3; // plain store
int y = 4; // plain store
The last two operations x = 3
and y = 4
are absolutely free to be re-ordered, they can't float above the volatile, but they can be re-ordered via themselves. The above example would be perfectly legal:
int tmp = i; // volatile load
// [LoadStore|LoadLoad]
// see how they have been inverted here...
int y = 4; // plain store
int x = 3; // plain store