Search code examples
performanceassemblyx86memory-barriers

x86 Non-Temporal Instructions: Is fencing ever needed for thread-local data?


On x86/x64, non-temporal store instructions such as MOVNTI and MOVNTPS make weaker memory ordering guarantees than "regular" stores. I understand fences (e.g. SFENCE) are necessary when sharing memory that will be written to non-temporally across threads. However, are fence instructions ever necessary for thread-local memory? If I write to a location via MOVNTPS, is the write guaranteed to be visible to subsequent instructions in the same thread without any fence instruction?


Solution

  • Yes, they will be visible without fences. See section 8.2.2 Memory Ordering in P6 and More Recent Processor Families in the Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide, Part 1 which says, among others:

    for memory regions defined as write-back cacheable, [...] Reads may be reordered with older writes to different locations but not with older writes to the same location.

    and

    Writes to memory are not reordered with other writes, with the following exceptions: -- streaming stores (writes) executed with the non-temporal move instructions (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD);