Search code examples
c++performancec++11lambda

Is there preformance penalty if I use lambda instead of an If block?


I'm using Dear ImGui to draw my UIs, in immediate mode, in an OpenGL program.

To do a tab, you tipically do:

if (ImGui::BeginTabItem("Some Tab")) {
    // Stuff
    ImGui::EndTabItem();
}

I come from Kotlin, where you typically use constructs like this:

something("bla bla bla") { x ->
  // whatever
} // this is a Kotlin lambda btw

So I wrote a simple wrapper, to wrap the call instead of writing the if and the end call, in order to avoid accidentally omitting the end call:

inline void tab(const char* label, std::function<void()> fn)
{
    if (ImGui::BeginTabItem(label)) {
        fn();
        ImGui::EndTabItem();
    }
}

Which replaces the first snippet by somehting like:

ui::tab("Some other tab", []{
    // More UI...
});

The question is, will the compiler emit similar code? Or there will be a big performance impact?

I'm afraid that if the compiler just places a new callable struct each time that the UI is drawn it will be a problem.

Also, I'm capturing this pointer, to use inside.

if (ImGui::BeginTabItem("Some Tab")) {
    // Stuff
    ImGui::EndTabItem();
}

// -- VS -- //

ui::tab("Some other tab", [this]{
    // More UI...
});

Solution

  • Is there preformance penalty if I use lambda instead of an If block?

    It depends upon your compiler and your optimization flags.

    With a modern GCC invoked as gcc -Wall -O3 -mtune=native, you'll be surprised of the optimizations it is able to do, including inline expansions. You could be even interested in link time optimization and whole program optimization (e.c. compile and link with gcc -O3 -flto -fwhole-program ....). Read about GCC optimization flags.

    See for example this draft report, funded by the CHARIOT H2020 project.

    Of course, the evil is in the details.

    An you could extend GCC with your plugin improving even more optimizations.

    Be however aware that compiler optimization is theoretically undecidable (see λ-calculus, Rice's theorem, MRDP theorem, incompleteness theorems, Curry-Howard correspondence, AGI, J.Pitrat's blog, the RefPerSys project, etc...) and practically imperfect (more an art than a science).

    The complexity (and trade secrets) of current high-end processors (CPU cache, branch predictors, superscalar architectures) makes worst-case execution time analysis practically impossible. In practice, you'll encounter cases where your compiler optimizations will disappoint you. So put efforts on profiling (e.g. with gprof or perf on Linux)

    See also the Ctuning, CompCert and Milepost GCC projects. Consider also OpenCL, OpenMP, OpenACC.

    At last, you could (on many platforms) generate specific code at runtime (using partial evaluation techniques). Consider then using JIT compilation libraries such as libgccjit. On some operating systems, you could generate C++ code at runtime, then compile it and load it as a plugin (e.g. with dlopen).

    Read of course Thompson Reflections on Trusting Trust paper and Bjarne Stroustrup papers.

    You could of course consider using JNI or SBCL or LuaJIT. Both can be mixed with C++ on major computing platforms, and practically facilitate runtime code generation so with efforts could improve execution time.

    You are coding:

    inline void tab(const char* label, std::function<void()> fn)

    (and your code don't do what you want if fn is, perhaps indirectly, throw-ing some exception) I would suggest instead

     inline void tab(const char* label, const std::function<void()>& fn)