I'm looking into the implementation of ConcurrentReferenceHashMap
in Spring Framework, particularly into restructure()
method:
protected final class Segment extends ReentrantLock {
private volatile Reference<K, V>[] references; // <-- !
private void restructure(boolean allowResize, @Nullable Reference<K, V> ref) {
boolean needsResize;
lock();
try {
//...
boolean resizing = false;
int restructureSize = this.references.length; // <-- !
//...
Reference<K, V>[] restructured =
(resizing ? createReferenceArray(restructureSize) : this.references);// <-- !
for (int i = 0; i < this.references.length; i++) { // <-- !
ref = this.references[i]; // <-- !
//...
}
if (resizing) {
this.references = restructured;
this.resizeThreshold = (int) (this.references.length * getLoadFactor());// <-- !
}
//...
}
finally {
unlock();
}
}
As you can see here we have multiple reads and writes into volatile references
array and all of them happen within lock()/unlock()
synchronization block.
The JavaDoc of java.util.concurrent.locks.Lock, namely its Memory Synchronization
part claims, that
All Lock implementations must enforce the same memory synchronization semantics as provided by the built-in monitor lock, as described in Chapter 17 of The Java™ Language Specification:
- A successful lock operation has the same memory synchronization effects as a successful Lock action.
- A successful unlock operation has the same memory synchronization effects as a successful Unlock action.
Unsuccessful locking and unlocking operations, and reentrant locking/unlocking operations, do not require any memory synchronization effects.
My question is: can I rewrite the code in order to have one read from volatile field into a local var (i.e. synchronize on stack) and use it to avoid repeating volatile access? Won't it break JMM assuming that
Unsuccessful locking and unlocking operations, and reentrant locking/unlocking operations, do not require any memory synchronization effects.
Here is my answer to the additional question in this comment:
I don't understand the race here: writing into
references
within restructure() is guarded by lock()-unlock() block, so it happens-before read fromreferences
in getReference(), doesn't it? And vice-versa: reading fromreferences
in getReference() "sees" only the last (in terms of execution within lock-unlock) writing intoreferences
, doesn't it?
Few notes first:
... writing into
references
within restructure() is guarded by lock()-unlock() block, so it happens-before read fromreferences
in getReference() ...
That is not true.
lock()-unlock()
block provides happens-before and atomicity guarantees only to other lock()-unlock()
blocks (and they must use the same Lock
object).
The read in getReference()
is not inside lock()-unlock()
, so it can happen in parallel with another thread writing something inside restructure()
method.
... reading from
references
in getReference() "sees" only the last (in terms of execution within lock-unlock) writing intoreference
...
references
(I am talking here about the reference
which is an instance field in Segment
, not the local variable with the same name in getReference()
) is a volatile
field, and, as a result, all reads and writes to this field happens in a global order (synchronization order, one per execution, can be different in different executions), and every read of references
always sees the latest write to it.
It's important to understand that:
references
field (which stores reference to Reference<K, V>[]
object) is volatile, not the Reference<K, V>[]
object itselfreferences[i]
aren't volatile even if the references
is volatile.Here is an execution that has a data race (the code is simplified for clarity):
volatile references = [val0]; // Initially
Thread 1 | Thread 2 | Thread 3
----------------------------------------------------|------------------------
| restructure(...) { |
| lock(); |
| references[0] = val1; |
| unlock(); |
| } | restructure(...) {
| | lock();
| | references[0] = val2;
| | unlock();
| | }
getReference(...) | |
var r2 = references[0]; | |
} | |
Given that:
references[0] = val2
is actually 2 actions from JMM perspective:
references
references[0]
var r2 = references[0];
is also 2 actions from JMM perspective:
references
references[0]
element of the arrayhere is a version that reflects that:
// Initially
volatile v = [val0];
Thread 1 | Thread 2 | Thread 3
--------------------------------------------
| lock(); |
| r3 = v; |
| r3[0] = val1; |
| unlock(); |
| | lock();
| | r4 = v;
| | r4[0] = val2;
| | unlock();
r1 = v; | |
r2 = r1[0]; | |
Let's rewrite it in terms of JLS actions:
o
represent a Reference<K, V>[]
objectv
is a volatile
variable initialized with a reference to o
objectRv
is a volatile readR
and W
are non-volatile read and writex
in R(..):x
or Rv(..):x
shows the value returned by the read// Initial writes
W(o=[val0]), Wv(v=o)
T1 | T2 | T3
-----------------------------------------
| Lock |
| Rv(v):o |
| W(o[0]=val1) |
| Unlock |
| | Lock
| | Rv(v):o
| | W(o[0]=val2)
| | Unlock
Rv(v):o | |
R(o[0]):? | |
Program order:
T1.Rv(v) -> T1.R(o[0])
T2.Lock -> T2.Rv(v) -> T2.W(o[0]=val1) -> T2.Unlock
T3.Lock -> T3.Rv(v) -> T3.W(o[0]=val2) -> T3.Unlock
Synchronization order (specific for this execution) — global order for synchronization actions:
Initial writes -> T2.Lock -> T2.Rv(v) -> T2.Unlock -> T3.Lock -> T3.Rv(v) -> T3.Unlock -> T1.Rv(v)
Synchronized-with relation (it exists between some action pairs in synchronization order):
Initial writes -> 1st action in every thread
T2.Unlock -> T3.Lock
Here are both program order and synchronized-with:
W(o=[val0])
↓ (po)
Wv(v=o)
┌——————————┼————————————┐
T1 │ T2 │ T3 │
│ ↓ (sw) │
│ Lock │
│ ↓ (po) │
│ Rv(v):o │
│ ↓ (po) │
│ W(o[0]=val1) │
│ ↓ (po) │
│ Unlock ↓ (sw)
│ └—————————→ Lock
│ (sw) ↓ (po)
│ Rv(v):o
│ ↓ (po)
│ W(o[0]=val2)
│ ↓ (po)
↓ (sw) Unlock
Rv(v):o
↓ (po)
R(o[0]):?
Together program order and synchronized-with give us happens-before:
W(o=[val0])
↓ (hb)
Wv(v=o)
┌——————————┤
│ T2 │
│ ↓ (hb)
│ Lock
│ ↓ (hb)
│ Rv(v):o
│ ↓ (hb)
│ W(o[0]=val1)
│ ↓ (hb)
│ Unlock
│ └—————————————┐
│ T3 │
│ ↓ (hb)
│ Lock
│ ↓ (hb)
│ Rv(v):o
│ ↓ (hb)
│ W(o[0]=val2)
T1 │ ↓ (hb)
↓ (hb) Unlock
Rv(v):o
↓ (hb)
R(o[0]):?
As you can see:
o[0]
:
W(o=[val0])
in Initial writesR(o[0])
in T1W(o[0]=val1)
in T2W(o[0]=val2)
in T3o[0]
in T1 is not ordered by happens-before with writes to o[0]
in T2 and T3This is a data race by the definition from the JLS:
When a program contains two conflicting accesses (§17.4.1) that are not ordered by a happens-before relationship, it is said to contain a data race.
As a result, the read of o[0]
in T1 in this execution can return:
val0
from Initial writes)val1
from T2 or val2
from T3)Of course, it's just one possible execution, there are many other possible executions.