Search code examples
c++assemblygcclambdahigher-order-functions

Why is a function that only returns a stateful lambda compiled down to any assembly at all?


The following non-templated (or is it?) function returning a non-generic, stateful lambda,

auto foo(double a) {
    return [a](double b) -> double {
        return a + b;
    };
}

compiled down to this with GCC or Clang. Why?

foo(double):
        ret

I would expect it generates no output at all.


The origin of the question

I don't even remeber why I started writing the snippet above, but anyway...

Initially I thought it should have compiled to something slightly longer, as I was expecting to see an add instruction at least.

But then I realized that the foo above, cannot be just compiled alone, in a cpp, and then linked against another TU where it's used, because I can't even write a declaration-only for it!

So I get to the point that the only reason to write that function in a non-header file, is that it is used in that non-header file, at which point there's the compiler can presumably inline it wherever it's used.

But if that's the case... then why copiling foo down to anything, if it's the only thing in the TU? There's nothing to link against, so why is any output generated at all for it?


Using a struct+operator() doesn't change anything, as this

struct Bar {
    double a;
    double operator()(double b) const {
        return a + b;
    }
};

Bar bar(double a) {
    return Bar{a};
}

generates the same ret-only code, which is also obvious in hindsight, as there's no way to even link against this bar function from other TUs, if Bar is hidden in the cpp file.


Solution

  • Looks like a missed optimization that GCC and Clang presumably don't try to look for. They'd have to run an extra optimization pass looking for functions that return lambdas like this and treat them as inline so they could avoid emitting a stand-alone definition in TUs that don't need one.

    But if they did that, they'd fail to detect violations of the one-definition rule at link time.
    e.g. ill-formed program if another TU contained int foo(double a){ return a; }
    So that's a good reason not to treat it as inline and emit no asm.

    But I guess they could emit just a .global foo ; foo: to define the symbol for ODR violation detection, without spending any code-size on a ret. That would be a special case of generating asm, and likely not worth the extra code in the GCC or Clang internals to keep track of this special handling it should get, and to do it.

    The value of such an optimization would be very minimal since as you say, this is just dead code in real program where this function is in a TU with nothing that calls it. It would cost compile time to look for it, and be code that GCC / Clang devs have to maintain.