Im trying to optimize my exercise application in VS2010. Basically I have several sqrt, pow and memset in the core loop. More specifically, this is what I do:
// in a cpp file ...
#include <cmath>
#pragma intrinsic(sqrt, pow, memset)
void Simulator::calculate()
{
for( int i=0; i<NUM; i++ )
{
...
float len = std::sqrt(lenSq);
distrib[0] = std::pow(baseVal, expVal);
...
clearQuad(i); // invokes memset
}
}
After build, the disassembly shows that, for example, the sqrt call still compiles as "call _CIsqrt(0x####)" nothing changes regardless of whether the /Oi flag is enabled or not.
Can anybody kindly explain how can i enable the intrinsic version and how can I verify it by the disassembly code? (I have also enabled the /O2 in the project settings.)
Thank you
Edit: Problem solved by adding /fp:fast. For sqrt, as an example, the intrinsic version uses a single "fsqrt" to replace the std version "call __CIsqrt()". Sadly, in my case, the intrinsic version is 5% slower.
Many thanks to Zan Lynx and mch.
You are compiling to machine code and not to .NET CLR. Right?
If you compile to .NET then the code won't be optimized until it is run through JIT. At that point .NET has its own intrinsics and other things that will happen.
If you are compiling to native machine code, you might want to play with the /arch option and the /fp:fast option.