Search code examples
cx86intrinsics

Correct usage of non-temporal store(_mm_stream_si64)


I have a function, as described below, where I want to write the same behavior as written in line (1) with non-temporal store. Is the usage of _mm_stream_si64 in line (2) a correct way to reproduce line (1)?

void func(void *dest, void *src){
   (1) *(void **)(dest) = src;
   (2) _mm_stream_si64(dest, src); 
}

I am not entirely sure I am using _mm_stream_si64 correctly because supposedly it expects __int64 type as a second parameter (_mm_stream_si64(__int64* mem_addr, __int64 a)). Though for mem_addr, I hope using void * should be fine.

Or is there any other intrinsic store I could use?


Solution

  • It's only correct in 64-bit mode, where a void* is 64-bit. But then yes, that's correct. It's an 8-byte alignment-required store that's strict-aliasing safe (it doesn't care about C types, only bytes of memory, like memcpy but with an alignment requirement.) So it's definitely safe if dst was correctly aligned to hold a void*.

    Although movnti with 64-bit operand-size is only available at all in 64-bit mode, so the only time that would compile but be wrong is with gcc -mx32 (32-bit pointers in 64-bit mode.)

    You should include a static_assert( sizeof(src) == sizeof(int64_t) ). Or an if(sizeof(src) == 4) to select _mm_stream_si32.

    Note that NT stores are generally only efficient if you're writing a full line with NT stores (all 64 bytes of a naturally-aligned 64-byte chunk). So a single movnti will often be worse for performance unless you're calling this or other wrapper functions for each element of a struct or something.