Search code examples
c++gccoptimizationclangvirtual

Is it possible to devirtualize this method call with GCC?


In the piece of code below, I expect the a->f2() call to be devirtualized, providing all compiler optimizations are enabled (-O3).

#include <memory>
#include <iostream>

class AbstractA {
public:
    virtual ~AbstractA() = default;

    virtual void f1() = 0;
    virtual void f2() {
        std::cout << "AbstractA::f2" << std::endl;
        this->f1();
    }
};

class ConcreteA final : public AbstractA {
    int x_;

public:
    /*void f2() override {
        this->f1();
    }*/

    void f1() override {
        std::cout << "ConcreteA::f1" << std::endl;
        x_++;
    }

    int get() const {
        return x_;
    }
};

int main() {
    auto a = std::make_unique<ConcreteA>();
    a->f2();
    return a->get();
}

While clang appears to deal with it properly by inlining subsequent method calls, yet GCC introduces a vtable and seems to use it later in the assembly (see here).

movq $vtable for ConcreteA+16, (%rax)
call AbstractA::f2()
...
movq 16(%rax), %rax
cmpq $ConcreteA::f1(), %rax
jne .L17

I am aware that the C++ standard does not enforce compilers to optimize by devirtualization. Nonetheless, my question is: is there anything that can be done so that GCC optimizes the call by omitting a vtable lookup?


Solution

  • Assuming you know that AbstractA::f2 is not a pure member function, then you can simply override it explicitly from ConcreteA so GCC can inline AbstractA::f2 and so ConcreteA::f1:

        // In ConcreteA:
    
        void f2() override
        {
            AbstractA::f2();
        }
    

    The vtable is still present due to a missing global optimization (GCC appears to consider that classes could possibly be used from outside while Clang know it is not the case, not to mention it better track pointers). CRTP may help to fix that if this is a problem.

    Theoretically, GCC has all the information here so to optimize the code without any modification but it does not due to missing optimization yet. Using Profile-Guided Optimizations (PGO) with Link-Time Optimizations (LTO) can help a lot to speculatively devirtualize member function calls (since GCC can know that there is no other classes using AbstractA, ConcreteA is only used in the main, better track pointers, etc.).