For my C++/CLI project I just tried to measure the cost of C++/CLI function pointers versus .NET delegates.
My expectation was, that C++/CLI function pointers are faster than .NET delegates. So my test separately counts the number of invocations of the .NET delegate and native function pointer throughout 5 seconds.
Now the results were (and still are) shocking to me:
That means, the native C++/CLI function pointer usage is almost 3x slower than using a managed delegate from within C++/CLI code. How can that be? I should use managed constructs when it comes to using interfaces, delegates or abstract classes in performance-critical sections?
The function which gets called continuously:
__int64 DoIt(int n, __int64 sum)
{
if ((n % 3) == 0)
return sum + n;
else
return sum + 1;
}
The code, which invokes the method, tries to make use of all the parameters as well as the return value, so nothing gets optimized away (hopefully). Here's the code (for .NET delegates):
__int64 executions;
__int64 result;
System::Diagnostics::Stopwatch^ w = gcnew System::Diagnostics::Stopwatch();
System::Func<int, __int64, __int64>^ managedPtr = gcnew System::Func<int, __int64, __int64>(&DoIt);
w->Restart();
executions = 0;
result = 0;
while (w->ElapsedMilliseconds < 5000)
{
for (int i=0; i < 1000000; i++)
result += managedPtr(i, executions);
executions++;
}
System::Console::WriteLine(".NET delegate: {0}M executions with result {2} in {1}ms", executions, w->ElapsedMilliseconds, result);
Similar to the .NET delegate invocation, the C++ function pointer is used:
typedef __int64 (* DoItMethod)(int n, __int64 sum);
DoItMethod nativePtr = DoIt;
w->Restart();
executions = 0;
result = 0;
while (w->ElapsedMilliseconds < 5000)
{
for (int i=0; i < 1000000; i++)
result += nativePtr(i, executions);
executions++;
}
System::Console::WriteLine("Function pointer: {0}M executions with result {2} in {1}ms", executions, w->ElapsedMilliseconds, result);
All tests done:
The direct call to "DoIt" is represented here by "Function call", which seems to get inlined by the compiler, as there is no (significant) difference in execution counts compared to a call to the inlined function.
Calls to C++ virtual methods are as 'slow' as the function pointer. A virtual method of a managed class (ref class) is as fast as the .NET delegate.
Update: I digged a little deeper, and it seems that for the tests with unmanaged functions, the transition to native code happens each time the DoIt function gets called. Therefore I wrapped the inner loop into another function which I forced to compile unmanaged:
#pragma managed(push, off)
__int64 TestCall(__int64* executions)
{
__int64 result = 0;
for (int i=0; i < 1000000; i++)
result += DoItNative(i, *executions);
(*executions)++;
return result;
}
#pragma managed(pop)
Additionally I tested std::function like that:
#pragma managed(push, off)
__int64 TestStdFunc(__int64* executions)
{
__int64 result = 0;
std::function<__int64(int, __int64)> func(DoItNative);
for (int i=0; i < 1000000; i++)
result += func(i, *executions);
(*executions)++;
return result;
}
#pragma managed(pop)
Now, the new results are:
std::function is a bit disappointing.
You are seeing the cost of "double thunking". The core problem with your DoIt() function is that it is being compiled as managed code. The delegate call is very fast, it is uncomplicated to go from managed to managed code through a delegate. The function pointer is slow however, the compiler automatically generates code to first switch from managed code to unmanaged code and make the call through the function pointer. Which then ends up in a stub that switches from unmanaged code back to managed code and calls DoIt().
Presumably what you really meant to measure was a call to native code. Use a #pragma to force DoIt() to be generated as machine code, like this:
#pragma managed(push, off)
__int64 DoIt(int n, __int64 sum)
{
if ((n % 3) == 0)
return sum + n;
else
return sum + 1;
}
#pragma managed(pop)
You'll now see that the function pointer is faster than a delegate