How to understand happens-before consistent

In chapter 17 of JLS, it introduce a concept: happens-before consistent.

A set of actions A is happens-before consistent if for all reads r in A, where W(r) is the write action seen by r, it is not the case that either hb(r, W(r)) or that there exists a write w in A such that w.v = r.v and hb(W(r), w) and hb(w, r)"

In my understanding, it equals to following words: ..., it is the case that neither ... nor ...

So my first two questions are:

is my understanding right?
what does "w.v = r.v" mean?

It also gives an Example: 17.4.5-1

Thread 1 Thread 2

B = 1; A = 2; 

r2 = A; r1 = B;

In first execution order:

1: B = 1;

3: A = 2;

2: r2 = A;  // sees initial write of 0

4: r1 = B;  // sees initial write of 0

The order itself has already told us that two threads are executed alternately, so my third question is: what does left number mean?

In my understanding, the reason of both r2 and r1 can see initial write of 0 is both A and B are not volatile field. So my fourth quesiton is: whether my understanding is right?

In second execution order:

1: r2 = A;  // sees write of A = 2

3: r1 = B;  // sees write of B = 1

2: B = 1;

4: A = 2;

According to definition of happens-before consistency, it is not difficult to understand this execution order is happens-before consistent(if my first understanding is correct). So my fifth and sixth questions are: does it exist this situation (reads see writes that occur later) in real world? If it does, could you give me a real example?

Solution

Each thread can be on a different core with its own private registers which Java can use to hold values of variables, unless you force access to coherent shared memory. This means that one thread can write to a value storing in a register, and this value is not visible to another thread for some time, like the duration of a loop or whole function. (milli-seconds is not uncommon)

A more extreme example is that the reading thread's code is optimised with the assumption that since it never changes the value, it doesn't need to read it from memory. In this case the optimised code never sees the change performed by another thread.

In both cases, the use of volatile ensures that reads and write occur in a consistent order and both threads see the same value. This is sometimes described as always reading from main memory, though it doesn't have to be the case because the caches can talk to each other directly. (So the performance hit is much smaller than you might expect).

On normal CPUs, caches are "coherent" (can't hold stale / conflicting values) and transparent, not managed manually. Making data visible between threads just means doing an actual load or store instruction in asm to access memory (through the data caches), and optionally waiting for the store buffer to drain to give ordering wrt. other later operations.