I wish to have a template method, which takes in data and processes it with a lambda function, whatever way the method itself wants to do that. However, I want the lambda function to get inlined so that the compiled assembly output won't end up having a "call" assembly instruction. Is this possible?
If it's not possible with lambdas, is there some other way to do that? Somehow using templates to pass a function as a template type or something?
I'm using C++17.
Below is an example of what I'm trying to achieve:
template <typename T>
static inline void Process(const T* p_source1,
const T* p_source2,
T* p_destination,
const int count,
std::function<T (T, T)> processor)
{
for (int i = 0; i < count; i++)
p_destination[i] = processor(p_source1[i], p_source2[i]);
}
void Process_Add(const uint8_t* p_source1,
const uint8_t* p_source2,
uint8_t* p_destination,
const int count)
{
// How to make something like this lambda inline?
auto lambda = [] (uint8_t a, uint8_t b) { return a + b; };
Process<uint8_t>(p_source1, p_source2, p_destination, count, lambda);
}
Yes, it's possible, but std::function
is making it very unlikely because the call mechanism is so complex that it can't be inlined, even in simple cases.
See Understanding the overhead from std::function and capturing synchronous lambdas
Here's the typical way of making inlining more likely:
template <typename T, typename F>
requires (std::invocable<F, const T&, const T&> // optional: C++20 constraint
&& std::convertible_to<std::invoke_result<F, const T&, const T&>, T>)
inline void Process(const T* p_source1,
const T* p_source2,
T* p_destination,
const int count,
F processor)
{
for (int i = 0; i < count; i++)
p_destination[i] = processor(p_source1[i], p_source2[i]);
}
Each lambda expression has a unique closure type, so processor(...)
invokes a call operator which is known at compile time.
This makes inlining quite likely, as long as the lambda expression is relatively short.
You could imitate the C++20 constraint with std::enable_if_t
, or you could just leave the function unconstrained.
Using static
and inline
in combination is basically pointless.
static
communicates internal linkage for functions, and that's likely not your intent, assuming this template is used in more than one cpp file.
See Should one never use static inline function?