Search code examples
c#multithreadingoptimizationvolatilememory-model

Does Interlocked.CompareExchange use a memory barrier?


I'm reading Joe Duffy's post about Volatile reads and writes, and timeliness, and i'm trying to understand something about the last code sample in the post:

while (Interlocked.CompareExchange(ref m_state, 1, 0) != 0) ;
m_state = 0;
while (Interlocked.CompareExchange(ref m_state, 1, 0) != 0) ;
m_state = 0;
… 

When the second CMPXCHG operation is executed, does it use a memory barrier to ensure that the value of m_state is indeed the latest value written to it? Or will it just use some value that is already stored in the processor's cache? (assuming m_state isn't declared as volatile).
If I understand correctly, if CMPXCHG won't use a memory barrier, then the whole lock acquisition procedure won't be fair since it's highly likely that the thread that was the first to acquire the lock, will be the one that will acquire all of following locks. Did I understand correctly, or am I missing out on something here?

Edit: The main question is actually whether calling to CompareExchange will cause a memory barrier before attempting to read m_state's value. So whether assigning 0 will be visible to all of the threads when they try to call CompareExchange again.


Solution

  • Any x86 instruction that has lock prefix has full memory barrier. As shown Abel's answer, Interlocked* APIs and CompareExchanges use lock-prefixed instruction such as lock cmpxchg. So, it implies memory fence.

    Yes, Interlocked.CompareExchange uses a memory barrier.

    Why? Because x86 processors did so. From Intel's Volume 3A: System Programming Guide Part 1, Section 7.1.2.2:

    For the P6 family processors, locked operations serialize all outstanding load and store operations (that is, wait for them to complete). This rule is also true for the Pentium 4 and Intel Xeon processors, with one exception. Load operations that reference weakly ordered memory types (such as the WC memory type) may not be serialized.

    volatile has nothing to do with this discussion. This is about atomic operations; to support atomic operations in CPU, x86 guarantees all previous loads and stores to be completed.