There is an illustration in kernel source Documentation/memory-barriers.txt, like this:
CPU 1 CPU 2 ======================= ======================= { B = 7; X = 9; Y = 8; C = &Y } STORE A = 1 STORE B = 2 <write barrier> STORE C = &B LOAD X STORE D = 4 LOAD C (gets &B) LOAD *C (reads B)
Without intervention, CPU 2 may perceive the events on CPU 1 in some effectively random order, despite the write barrier issued by CPU 1:
+-------+ : : : : | | +------+ +-------+ | Sequence of update | |------>| B=2 |----- --->| Y->8 | | of perception on | | : +------+ \ +-------+ | CPU 2 | CPU 1 | : | A=1 | \ --->| C->&Y | V | | +------+ | +-------+ | | wwwwwwwwwwwwwwww | : : | | +------+ | : : | | : | C=&B |--- | : : +-------+ | | : +------+ \ | +-------+ | | | |------>| D=4 | ----------->| C->&B |------>| | | | +------+ | +-------+ | | +-------+ : : | : : | | | : : | | | : : | CPU 2 | | +-------+ | | Apparently incorrect ---> | | B->7 |------>| | perception of B (!) | +-------+ | | | : : | | | +-------+ | | The load of X holds ---> \ | X->9 |------>| | up the maintenance \ +-------+ | | of coherence of B ----->| B->2 | +-------+ +-------+ : :
I don't understand, since we have a write barrier, so, any store must take effect when C = &B is executed, which means whence B would equals 2. For CPU 2, B should have been 2 when it gets the value of C, which is &B, why would it perceive B as 7. I am really confused.
The key missing point is the mistaken assumption that for the sequence:
LOAD C (gets &B)
LOAD *C (reads B)
the first load has to precede the second load. A weakly ordered architectures can act "as if" the following happened:
LOAD B (reads B)
LOAD C (reads &B)
if( C!=&B )
LOAD *C
else
Congratulate self on having already loaded *C
The speculative "LOAD B" can happen, for example, because B was on the same cache line as some other variable of earlier interest or hardware prefetching grabbed it.