C++11 Performance: Lambda inlining vs Function template specialization

My question is to expanding on this: Why can lambdas be better optimized by the compiler than plain functions?

To reiterate, the conclusion is that lambdas create different specializations which compilers can trivially inline, whereas function pointers are not as easy to inline since there is a single specialization for a set of function prototypes. Considering that, would function pointer templates as-fast-as/faster lambdas?

int add(int a, int b) { return a + b; }
int sub(int a, int b) { return a - b; }

template <class F>
int operate(int a, int b, F func)
{
    return func(a, b);
}

template <int func(int, int)>
int operateFuncTemplate(int a, int b)
{
    return func(a, b);
}

int main()
{
    // hard to inline (can't determine statically if operate's f is add or sub since its just a function pointer)
    auto addWithFuncP = operate(1, 2, add);
    auto subWithFuncP = operate(1, 2, sub);

    // easy to inline (lambdas are unique so 2 specializations made, each easy to inline)
    auto addWithLamda = operate(1, 2, [](int a, int b) { return a + b; });
    auto subWithLamda = operate(1, 2, [](int a, int b) { return a - b; });

    // also easy to inline? specialization means there are 2 made, instead of just 1 function definition with indirection?
    auto addWithFuncT = operateFuncTemplate<add>(1, 2);
    auto subWithFuncT = operateFuncTemplate<sub>(1, 2);
}

So if I could rank these on a scale of performance then:

operatorFuncTemplate >= operate<LAMBDA> >= operate<FUNCTIONPTR>

Are there instances where this relation could fail in non-trivial examples?

Solution

If the compiler can track "this function pointer points to this function", the compiler can inline the call through the function pointer.

Sometimes compilers can do this. Sometimes they cannot.

Unless you store a lambda in a function pointer, std::function, or similar type-erasing wrapper, the compiler at the point where the lambda is called knows the type of the lambda, so knows the body of the lambda. The compiler can trivially inline the function call.

Nothing about using a function template changes this, except if the argument is constexpr like a function non-type template parameter:

template <int func(int, int)>

this is an example of that. Here, the function template, in the body of the function, is guaranteed to be known at compile time.

Pass that func anywhere else, however, and the compiler can lose track of it.

In any case, any speed difference is going to be highly context dependent. And sometimes the larger binary size caused by inlining of a lambda will cause more slowdown than the inability to inline a function pointer, so performance can go the other way.

Any universal claims like you are trying to make is going to be wrong sometimes.