The code:
double Ret_Value=0;
on default settings VS2012 compiles to:
10112128 xorps xmm0,xmm0
1011212E movsd mmword ptr [Ret_Value],xmm0
If SSE2 is disabled in project settings this is compiled to:
101102AC fldz
101102AE lea eax,[Ret_Value]
101102B1 push eax
101102B2 fstp qword ptr [Ret_Value]
Edit: I am not sure that push
and lea
are related to this initialization, maybe it is for stuff done after that, just disassembly shows them for this C++ line of code.
Is SSE2 significantly better? Except that it is 2 instructions shorter? What kind of optimization is done here?
How this was discovered: the app started to fail on an old processor which doesn't support SSE2.
The Intel Optimization Reference Manual section 3.8.1 (Guidelines for Optimizing Floating-point Code) says -
Enable the compiler’s use of SSE, SSE2 and more advanced SIMD instruction sets (e.g. AVX) with appropriate switches. Favor scalar SIMD code generation to replace x87 code generation.
Section 3.8.5 goes on to explain:
Use Streaming SIMD Extensions 2 or Streaming SIMD Extensions unless you need an x87 feature. Most SSE2 arithmetic operations have shorter latency then their X87 counterpart and they eliminate the overhead associated with the management of the X87 register stack.