Tried looking for the answer to this question in the Intel 64/IA-32, but couldn't find a definitive answer. Questions is: Do memory ordering instructions, such as SFENCE, have effect on the local processor only, or do they spread to the entire cache coherence domain, such as CPUs on a neighboring socket (in a multi-socket system)?
SFENCE
affects the order in which the local CPU's stores become globally visible to other cores on the same and other sockets, or to memory-mapped I/O.
Other cores can't tell whether you ran SFENCE
or not, all they can observe is the order of your memory operations. (i.e. the implementation of sfence
is internal to a core and its store queue).
sfence
was introduced in SSE1, with PIII, before the first multi-core CPUs. At that time, the only SMP systems were multi-socket.
Also note that it only does anything useful with weakly-ordered stores (movnt*
or stores to write-combining memory regions). Normal stores have "release" semantics already on x86. Only mfence
(and lock
ed instructions) matter for normal memory operations on x86, to prevent StoreLoad reordering.