Search code examples
c++c++23

Why can't GCC optimize constructor call via alias?


A bit inspired by this question

Having this code

#include <string>

template<class T>
struct A {
    template <typename U> using NewA = A<U>;
    constexpr A(T const& t){}
    constexpr auto f() const {
        return NewA{"bye"};
    }
};

A(const char*) -> A<std::string>;

int main() {
    A{"hello"}.f();
}

GCC 13.1 generates a lot of useless code (call std::string constructor/destructor most notably and some other stuff)

main:
        sub     rsp, 72
        mov     edx, OFFSET FLAT:.LC1+5
        mov     esi, OFFSET FLAT:.LC1
        lea     rax, [rsp+16]
        mov     rdi, rsp
        mov     QWORD PTR [rsp], rax
        call    void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>(char const*, char const*, std::forward_iterator_tag) [clone .isra.0]
        lea     rax, [rsp+48]
        mov     edx, OFFSET FLAT:.LC2+3
        mov     esi, OFFSET FLAT:.LC2
        lea     rdi, [rsp+32]
        mov     QWORD PTR [rsp+32], rax
        call    void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>(char const*, char const*, std::forward_iterator_tag) [clone .isra.0]
        lea     rdi, [rsp+32]
        call    std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_dispose()
        mov     rdi, rsp
        call    std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_dispose()
        xor     eax, eax
        add     rsp, 72
        ret

If I replace this line return NewA{"bye"}; with return ::A{"bye"}; (which is suppose to be exactly the same from my opinion)

#include <string>

template<class T>
struct A {
    template <typename U> using NewA = A<U>;
    constexpr A(T const& t){}
    constexpr auto f() const {
        return ::A{"bye"};
    }
};

A(const char*) -> A<std::string>;

int main() {
    A{"hello"}.f();
}

the compiler is able to optimize everything into one xor

main:
        xor     eax, eax
        ret

Example

Is that some kind of "early version bug"? Clang can't even compile this code yet (doesn't support CTAD via alias).

UPD: Looks like at least GCC 10.1 can optimize everything perfectly


Solution

  • Both your two variants return ::A{"bye"}; and return NewA{"bye"}; make your program IFNDR (ill-formed, no diagnostic required). (see note at the bottom though)

    The deduction guides considered for CTAD are those reachable from the instantiation context in which CTAD is performed.

    So in the implicit instantiation caused by A{"hello"}.f(); your user-declared deduction guide is always considered.

    However, both NewA{"bye"} and ::A{"bye"} are non-dependent. For a non-dependent construct in a template definition there is an additional requirement that its interpretation in a hypothetical instantiation immediately following the template definition is not different than its interpretation in any actual instantiation of the template specialization. If this is not satisfied, the program is IFNDR. (see[temp.res.general]/6.6).

    In your case the hypothetical instantiation immediately after the definition of A the user-declared deduction guide is not reachable and therefore it would deduce A<char[4]> instead of A<std::string>, a different interpretation than in the actual instantiation.

    The rule permits the compiler to do all of the deduction, overload resolution, etc. of non-dependent constructs immediately where they are defined, which I guess is what GCC is doing here for the ::A{"bye"} variant (which is clearly non-dependent), but not for the NewA{"bye"} variant (which is a bit less clear, see end of this answer). Choosing A<char[4]> instead of A<std::string> there is one less std::string temporary to construct and that probably makes the optimization for the compiler much simpler.

    To optimize everything away, it needs to decide to inline all std::string constructor/destructor calls and everything they call and must then, except if SSO applies, recognize matching operator new/operator delete calls in order to replace them by stack memory. (Generally a call to these allocation functions is observable because you can replace them anywhere in the program, even at runtime via dynamic linking. But the compiler is allowed to match these calls and omit them both if it can provide e.g. stack memory instead.)


    EDIT: Originally I claimed that NewA{"bye"} was dependent, but thinking about it again, both ::A{"bye"} and NewA{"bye"} seem non-dependent, so that probably both are IFNDR. It seems that dependence of placeholders for deduced class types currently isn't clearly specified, see CWG issue 2600, but with the proposed resolution both your variants would indeed be non-dependent and therefore IFNDR.