English is not my first language, please forgive my grammar mistake
my machine env is
AMD 5900x, win10 latest, VS2022 MSVC lateset
following code have passed compile(debug-X86, release-x86 MSVC)on my machine and my colleague's machine(which is pretty similar to mine), but the output isn't expected
#include <intrin.h>
#include <iostream>
#include <xmmintrin.h>
int main(int argc, char* argv[])
{
union
{
float f[2];
__m64 m;
} a = {{10.f, 200.f}};
a.m = _mm_shuffle_pi16(a.m, _MM_SHUFFLE(1, 0, 3, 2));
std::cout << a.f[0] << " " << a.f[1] << std::endl;
std::cout << a.f[0] << " " << a.f[1] << std::endl;
return 0;
}
expected output:
200 10
200 10
ACTUAL output:
-nan(ind) 10
200 10
// note: no math operation undertake between two std::cout
I have checked the under binary data, they both legal in IEEE 754 standard:
the disassembly is pretty normal:
which really makes me think is there any up-stream BUG? Or any compile env issue? any similar situation have you met? any advice or further info needed? thanks in advance.
UPDATE1: same code on
Linux kali 6.3.0-kali1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.3.7-1kali1 (2023-06-29) x86_64 GNU/Linux
gcc (Debian 13.1.0-6) 13.1.0
g++ (Debian 13.1.0-6) 13.1.0
Debian clang version 14.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
compile command:
g++ a.cpp
clang++ -o aaa a.cpp
output(for both compile method):
200 10
200 10
While I don't know what exactly happened between assigning to a.m
and stuff being printed, there is a bug in the code itself: _mm_shuffle_pi16
is an MMX intrinsic, and you did not call _mm_empty
(or _m_empty
), so the FPU-state is still in MMX-mode. That would disrupt x87-style FPU instructions later. You compiled for x86 (so 32-bit), so it is likely that x87 instructions were used at some point. 64-bit code mostly doesn't use x87 instructions, in that case it may seem like nothing bad happened, which could explain why the code appeared to work on 64bit kali linux.
MMX is retrocomputing at this point. You could add _mm_empty
to fix this code, but you may as well use SSE instead. Ditch the union (you can use the "cast" family of intrinsics to do safe reinterpretation if needed, but you don't need it here since SSE has a floating point shuffle), and do something like this:
__m128 a = _mm_setr_ps(10.0f, 200.0f, 0.0f, 0.0f);
a = _mm_shuffle_ps(a, a, _MM_SHUFFLE(3, 2, 0, 1));
float test[4];
_mm_storeu_ps(test, a);
std::cout << test[0] << " " << test[1] << std::endl;
There have historically been some compiler bugs with _mm_empty
, but that's not what you're seeing in your program, there is no _mm_empty
in it to begin with.