I write some optimizations for processing single precision floating-point calculation SIMD intrinsics.
Sometimes a pd
double-precision instruction does what I want more easily than any ps
single precision one.
Example 1:
I have pointer float prt* which point to block of floats: f0 f1 f2 f3 etc.
I want to load __m256 value with [ f0, f1, f0, f1, f0, f1, f0, f1 ]. I didn't find a 64-bit broadcast for __m256
data types. Can I use _mm256_broadcast_sd
on floats?
float* ptr = ...; // pointer to some memory chunk aligned to 4 bytes
__m256 vat = _mm256_castpd_ps( _mm256_broadcast_sd( ( double* )ptr ) );
Example 2:
I have __m256 value [f0, f1, f2, f3, f4, f5, f6, f7]. Can I use shift instructions like a _mm256_srl_epi32, which take as argument __m256i values for manipulation with my __m256 value?
I check it in practice and it works, but is it a correct way to use instructions with different types?
Yes, vbroadcastsd
is a good asm instruction for broadcasting a pair of floats, and _mm256_broadcast_sd
+ a cast intrinsic is a safe way to implement it in C.
Note that you aren't dereferencing (in pure C) a double*
that points at float
objects. You're only passing it to an intrinsic function. _mm256_set1_pd( *(double*)floatp )
would be strict aliasing undefined behaviour in C, but load/store intrinsics are defined to work regardless of what the pointer is actually pointing at. Exactly so you can easily do wide loads/stores to whatever data you actually have, not just __int64
or double
.
For example, GCC's header defines _mm256_broadcastsd(const double*)
as a wrapper around __builtin_ia32_vbroadcastsd256
. And GCC defines _mm_loadl_epi64
to include a dereference of *(__m64_u *)__P
, where __m64_u
is an unaligned may-alias version of __m64
which it defines as.
typedef int __m64_u __attribute__ ((__vector_size__ (8), __may_alias__, __aligned__ (1)));
In general, even load/store intrinsics that take a float*
or double*
(instead of __m128i*
) are alignment and strict-aliasing safe. (Or at least I think they're supposed to be. On some compilers there might be some which aren't actually strict-aliasing safe. So it can be a pain to get them to safely emit vpbroadcastd
from a pointer that isn't actually pointing at an int
, for example; I forget which intrinsic it was that found some compiler not respecting possible aliasing for.)
Your example 2 is not clear. Are you wanting to bit-shift the bit-patterns of floats? Yes, of course you can do that, that's why SIMD cast intrinsics exist to keep the C compiler happy when you want to reinterpret the same bits as a different vector type.
It's common to do that as part of implementing exp()
or log
for example, such as Fastest Implementation of Exponential Function Using AVX