Search code examples
optimizationvisual-c++x86ssefpu

Why does MSVC use SSE2 instruction for such trivial thing?


The code:

double Ret_Value=0;

on default settings VS2012 compiles to:

10112128  xorps       xmm0,xmm0  
1011212E  movsd       mmword ptr [Ret_Value],xmm0

If SSE2 is disabled in project settings this is compiled to:

101102AC  fldz  
101102AE  lea         eax,[Ret_Value]  
101102B1  push        eax  
101102B2  fstp        qword ptr [Ret_Value] 

Edit: I am not sure that push and lea are related to this initialization, maybe it is for stuff done after that, just disassembly shows them for this C++ line of code.

Is SSE2 significantly better? Except that it is 2 instructions shorter? What kind of optimization is done here?

How this was discovered: the app started to fail on an old processor which doesn't support SSE2.


Solution

  • The Intel Optimization Reference Manual section 3.8.1 (Guidelines for Optimizing Floating-point Code) says -

    Enable the compiler’s use of SSE, SSE2 and more advanced SIMD instruction sets (e.g. AVX) with appropriate switches. Favor scalar SIMD code generation to replace x87 code generation.

    Section 3.8.5 goes on to explain:

    Use Streaming SIMD Extensions 2 or Streaming SIMD Extensions unless you need an x87 feature. Most SSE2 arithmetic operations have shorter latency then their X87 counterpart and they eliminate the overhead associated with the management of the X87 register stack.