multithreading locking mutex deadlock memory-barriers

Are acquire/release semantics really enough for implementing critical sections?

(Here, by critical section, I mean any synchronization mechanism that prevents concurrent access to some resource.)

It seems like the consensus on the web is that you only need acquire semantics when entering a critical section and release semantics when leaving it. But doesn't this open up the possibility of a deadlock?

Here is some pseudo-code to explain what I mean. Here is the original code:

Thread 1:
    enter A // acquire semantics
    // ... some work within A
    leave A // release semantics

    enter B // acquire semantics
    // ... some work within B
    leave B // release semantics

Thread 2:
    enter B // acquire semantics
    // ... some work within B
    leave B // release semantics

    enter A // acquire semantics
    // ... some work within A
    leave A // release semantics

When executing this code, the CPU could legally transform it into this (nothing moves in front of acquires, nothing moves behind releases):

Thread 1:
    enter A // acquire semantics
    enter B // acquire semantics
    // ... some work within A
    // ... some work within B
    leave A // release semantics
    leave B // release semantics

Thread 2:
    enter B // acquire semantics
    enter A // acquire semantics
    // ... some work within B
    // ... some work within A
    leave B // release semantics
    leave A // release semantics

But now, we have a deadlock hazard which wasn't here before! Two threads are entering more than one critical section, but in a different order.

So don't critical sections need to prevent store/load reordering as well? I.e. don't they need sequentially consistent semantics instead of just acquire/release? Why is this not specified

Solution

Acquire and release are from the release consistency model (RC).
In RC there are 2 types of memory operations:

ordinary reads and writes
special operations, which are divided into 2 classes:
- sync (used for synchronization): acquire and release
- nsync (not used for synchronisation): the same as memory_order_relaxed in C++

This only applies to ordinary reads and writes:

nothing moves in front of acquires, nothing moves behind releases

Special operations have their own reordering rules.
There are 2 types of RC:

with sequentially consistent special operations (RCsc)
with processor consistent special operations (RCpc)

Here are the allowed reorderings according to the wiki:

For sequential consistency (RCsc), the constraints are:

acquire → all,

all → release,

special → special.

For processor consistency (RCpc) the write to read program order is relaxed, having constraints:

acquire → all,

all → release,

special → special (except when special write is followed by special read).

Note: the above notation A → B, implies that if the operation A precedes B in the program order, then program order is enforced.

In case of RCsc special operations aren't allowed to reorder, so the reordering in your example isn't legal.

In case of RCpc it's a little bit more complicated.
Entering a critical section is actually 2 special operations executed atomically:

an acquire read of a synchronization variable
a write to the same synchronization variable which marks the lock as taken and prevents other threads from entering this critical section

As a result:

entering a critical section includes the special write
leaving a critical section is a release (i.e. also a special write)

Special writes aren't allowed to reorder in RCpc, so the reordering in your example isn't legal in RCpc.