Search code examples
c++ooppointerscompiler-construction

Why can't C++ compiler know a pointer is pointing to a derived class?


I've just started learning about OOP in C++. I was wondering why is the virtual keyword needed to instruct the compiler to do late binding ? Why can't the compiler know at compile time that the pointer is pointing to a derived class ?

class A { 
    public: int f() { return 'A';}
};

class B : public A {
    public: int f() { return 'B';}
};

int main() {

    A* pa;
    B b;
    pa = &b; 
    cout << pa->f() << endl;
}

Solution

  • (I started out with some comments on an answer, but decided I should just write up my own answer.)

    I've rearranged your code slightly here to make it easier to compile and view the output:

    #include <iostream>
    
    #ifdef V
    #define VIRTUAL virtual
    #else
    #define VIRTUAL /*nothing*/
    #endif
    
    class A { 
        public: VIRTUAL char f() { return 'A';}
    };
    
    class B : public A {
        public: char f() { return 'B';}
    };
    
    int main() {
        A* pa;
        B b;
        pa = &b; 
        std::cout << pa->f() << std::endl;
    }
    

    Compiling and running it shows:

    $ c++ t.cc && ./a.out
    A
    $ c++ -DV t.cc && ./a.out
    B
    

    which shows that the virtual keyword changes the behavior of the program. This is in fact required by the language standard. Your question could, I think, be best rephrased as Why is the standard written this way (which has a more useful general answer) rather than Can the compiler optimize my code (which has a specific but useless answer: yes, it can, but it's still required to print A, not B).

    The language definition doesn't forbid the compiler from doing special optimization tricks. Instead—and especially so in this case, for C++— the language specification specifically tries to make it easier for compiler-writers to optimize. This winds up putting more of a burden on C++ programmers.

    If C++ were a different language ...

    The feature you're talking about, which is the virtual keyword, specifically exists because of this. The language could have been defined differently (and some other languages are): they could have said that compiler writers must not ever assume that, given some valid A* pa, pa points to some actual instance of type A. Then:

    std::cout << pa->f() << std::endl;
    

    would always have to figure out: What is the real underlying type of *pa and hence what function f shall I call here?

    In this hypothetical (not-C++) language,1 a compiler that optimizes could take your code and build it to call B::f() directly, because pa points to an instance of type B. But in this same language, a compiler that tries to optimize heavily could not make assumptions about functions where the underlying type of pa is determined by something not predictable at compile-time:

    void f(A* pa) {
        std::cout << pa->f() << std::endl;
    }
    
    int main(int argc, char **argv) {
        A a;
        B b;
        f(argc > 1 ? &b : &a);
    }
    
    

    This program needs to print A when called with no extra arguments, and B when called with extra arguments. So if our not-C++ language lacks a virtual keyword, or defines it as a no-op, function f—which calls either A::f() or B::f() at run-time—must always figure out which underlying function to call.


    1It's not C either. The name D is taken. Perhaps P, from the BCPL progression?


    Conclusion

    Because C++ does have the virtual keyword, the variant we build that has a non-virtual f() in base class A can optimize pa->f() calls by assuming that pa->f() calls A::f(). Hence, instead of actually calling A::f(), an optimizing compiler can just write "A\n" to std::cout. Whether or not the C++ compiler optimizes, the call must produce A rather than B.

    The variant with the virtual keyword inserted must not assume that pa->f() calls A::f(). If it can optimize enough to see that pa->f() calls B::f(), and therefore, at compile time, eliminate the call entirely and have the function write "B\n", that's OK! If it can't optimize that much, that's OK too—at least, as far as the language specification goes.

    You, as a programmer, are required to know this about the virtual keyword, and to use it whenever you want the compiler to be forced to pick the right function based on the actual runtime class, whether or not the compiler is smart enough to do that at compile-time. If you want to allow and force the compiler to just use the base-class function every time, you can omit the virtual keyword.