(Here, by critical section, I mean any synchronization mechanism that prevents concurrent access to some resource.)
It seems like the consensus on the web is that you only need acquire semantics when entering a critical section and release semantics when leaving it. But doesn't this open up the possibility of a deadlock?
Here is some pseudo-code to explain what I mean. Here is the original code:
Thread 1:
enter A // acquire semantics
// ... some work within A
leave A // release semantics
enter B // acquire semantics
// ... some work within B
leave B // release semantics
Thread 2:
enter B // acquire semantics
// ... some work within B
leave B // release semantics
enter A // acquire semantics
// ... some work within A
leave A // release semantics
When executing this code, the CPU could legally transform it into this (nothing moves in front of acquires, nothing moves behind releases):
Thread 1:
enter A // acquire semantics
enter B // acquire semantics
// ... some work within A
// ... some work within B
leave A // release semantics
leave B // release semantics
Thread 2:
enter B // acquire semantics
enter A // acquire semantics
// ... some work within B
// ... some work within A
leave B // release semantics
leave A // release semantics
But now, we have a deadlock hazard which wasn't here before! Two threads are entering more than one critical section, but in a different order.
So don't critical sections need to prevent store/load reordering as well? I.e. don't they need sequentially consistent semantics instead of just acquire/release? Why is this not specified
Acquire and release are from the release consistency model (RC).
In RC there are 2 types of memory operations:
memory_order_relaxed
in C++This only applies to ordinary reads and writes:
nothing moves in front of acquires, nothing moves behind releases
Special operations have their own reordering rules.
There are 2 types of RC:
Here are the allowed reorderings according to the wiki:
For sequential consistency (RCsc), the constraints are:
- acquire → all,
- all → release,
- special → special.
For processor consistency (RCpc) the write to read program order is relaxed, having constraints:
- acquire → all,
- all → release,
- special → special (except when special write is followed by special read).
Note: the above notation A → B, implies that if the operation A precedes B in the program order, then program order is enforced.
In case of RCsc special operations aren't allowed to reorder, so the reordering in your example isn't legal.
In case of RCpc it's a little bit more complicated.
Entering a critical section is actually 2 special operations executed atomically:
As a result:
Special writes aren't allowed to reorder in RCpc, so the reordering in your example isn't legal in RCpc.