Search code examples
multithreadingassemblycpu-architecturecpu-cachememory-barriers

What is the relationship between cache coherence and memory barriers?


As far as I know, memory barriers are used to avoid out-of-order execution. However, memory barriers are often mentioned also when talking about cache coherence. I'm not sure how the two concepts are connected, since - according to my findings - cache coherence should already be guaranteed at a hardware level through various protocols, e.g. MESI and such. Is preventing out-of-order execution with memory barriers another way to (manually) grant cache coherence?


Solution

  • On modern CPUs stores first go into a store buffer. When the store leaves the store buffer and is applied to the cache line only then the cache coherence protocol gets involved.

    While the store is pending in the store buffer, the CPU which made the store can read it back from the store buffer (store-to-load forwarding), but other CPUs cannot observe the effects of the store just yet.

    Memory barriers such as x86 MFENCE wait for the store buffer to drain:

    Performs a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior the MFENCE instruction. This serializing operation guarantees that every load and store instruction that precedes the MFENCE instruction in program order becomes globally visible before any load or store instruction that follows the MFENCE instruction. The MFENCE instruction is ordered with respect to all load and store instructions, other MFENCE instructions, any LFENCE and SFENCE instructions, and any serializing instructions (such as the CPUID instruction). MFENCE does not serialize the instruction stream.

    See Memory Barriers: a Hardware View for Software Hackers for more details.