I have a function, as described below, where I want to write the same behavior as written in line (1)
with non-temporal store
. Is the usage of _mm_stream_si64
in line (2)
a correct way to reproduce line (1)
?
void func(void *dest, void *src){
(1) *(void **)(dest) = src;
(2) _mm_stream_si64(dest, src);
}
I am not entirely sure I am using _mm_stream_si64
correctly because supposedly it expects __int64
type as a second parameter (_mm_stream_si64(__int64* mem_addr, __int64 a)
). Though for mem_addr
, I hope using void *
should be fine.
Or is there any other intrinsic store I could use?
It's only correct in 64-bit mode, where a void*
is 64-bit. But then yes, that's correct. It's an 8-byte alignment-required store that's strict-aliasing safe (it doesn't care about C types, only bytes of memory, like memcpy but with an alignment requirement.) So it's definitely safe if dst
was correctly aligned to hold a void*
.
Although movnti
with 64-bit operand-size is only available at all in 64-bit mode, so the only time that would compile but be wrong is with gcc -mx32
(32-bit pointers in 64-bit mode.)
You should include a static_assert( sizeof(src) == sizeof(int64_t) )
. Or an if(sizeof(src) == 4)
to select _mm_stream_si32
.
Note that NT stores are generally only efficient if you're writing a full line with NT stores (all 64 bytes of a naturally-aligned 64-byte chunk). So a single movnti
will often be worse for performance unless you're calling this or other wrapper functions for each element of a struct or something.