Search code examples
x86atomiccpu-architecturecompare-and-swapmesi

Why can the MESI protocol not guarantee atomicity of CMPXCHG on x86 without the LOCK prefix?


I understand that the MESI protocol successfully guarantees the same view of memory (caches) for different cores. My question comes from the fact that during writing MESI guarantees that the cache is exclusively owned by a CPU and then atomic CMPXCHG just compares and exchanges values atomically. So why do we need to use the LOCK instruction and thus lock the cache line when we already have that guarantee from the MESI protocol?


Solution

  • atomic CMPXCHG just compares and exchanges values atomically

    No, the cache-access hardware doesn't implement CMPXCHG as a single-cycle inherently-atomic operation. It's synthesized out of multiple uops that load and separately store.

    If that's how regular CMPXCHG worked, your reasoning would be correct. But regular CMPXCHG is not atomic (for observers on other cores).


    lock cmpxchg decodes to multiple uops that keep the cache-line "locked" from the load to the store, turning it into a single atomic transaction as far as any other observers in the system can see. (i.e. delay responding to MESI invalidate or share requests for that line until after the store commits). It also makes it a full memory barrier.


    Without lock, CMPXCHG decodes to multiple uops that load, check for equality stuff, and then either store a new value or not according to the compare result. As far as atomicity, it's the same as add [mem], edx, which uses the ALU for addition in between load and store uops. i.e. it's not atomic, except on the same core with respect to interrupts (because interrupts can only happen at an instruction boundary).

    The load and store are each separately atomic, but they aren't a single atomic RMW transaction. If another core invalidates our copy of the cache line and stores a new value between our load and our store, our store will step on the other store. And that other store will appear in the global order of operations on that cache line between our load and store, violating the definition of "atomic" = indivisible.