multithreading x86 locking x86-64 mutual-exclusion

Exit critical region

Consider several threads executing concurrently the following code:

long gf = 0;// global variable or class member

//...

if (InterlockedCompareExchange(&gf, 1, 0)==0) // lock cmpxchg
{
    // some exclusive code - must not execute in concurrent
    gf = 0; // this is ok ? or need
    //InterlockedExchange(&gf, 0); // [lock] xchg 
}

Treat the code above as C-like pseudo-code, which will be translated more-or-less directly into assembly without the usual concessions to compiler optimizations such as re-odering and store elimination.

So after some thread exclusively acquires flag gf- to exit from the critical region is it enough to write a zero (as in gf = 0) or does this need to be interlocked - InterlockedExchange(&gf, 0)?

If both are OK, which is better from a performance view, assuming that with high probability that several cores concurrently call InterlockedCompareExchange(&gf, 1, 0)?

Several threads periodically execute this code (from several places, when some events fire) and it is important that the next thread again enters the critical region as soon as possible after it freed.

Solution

Related: Spinlock with XCHG explains why you don't need xchg to release a lock in x86 asm, just a store instruction.

But in C++, you need something stronger than a plain gf = 0; on a plain long gf variable. The C / C++ memory model (for normal variables) is very weakly ordered, even when compiling for strongly-ordered x86, because that's essential for optimizations.

You need a release-store to correctly release a lock, without allowing operations in the critical section to leak out of the critical section by reordering at compile time or runtime with the gf=0 store. http://preshing.com/20120913/acquire-and-release-semantics/.

Since you're using long gf, not volatile long gf, and you aren't using a compiler memory barrier, nothing in your code would prevent compile-time reordering. (x86 asm stores have release semantics, so it's only compile-time reordering we need to worry about.) http://preshing.com/20120625/memory-ordering-at-compile-time/

We get everything we need as cheaply as possible using std::atomic<long> gf; and gf.store(0, std::memory_order_release); atomic<long> is lock-free on every platform that supports InterlockedExchange, AFAIK, so you should be ok to mix and match. (Or just use gf.exchange() to take the lock. If rolling your own locks, keep in mind that you should loop on a read-only operation + _mm_pause() while waiting for the lock, don't hammer away with xchg or lock cmpxchg and potentially delay the unlock. See Locks around memory manipulation via inline assembly.

This is one of the cases where the warning in Why is integer assignment on a naturally aligned variable atomic on x86? that you need atomic<> to make sure the compiler actually does the store where / when you need it applies.