c++compiler-construction programming-languages compiler-optimization vtable

Programming Language Idea: Avoiding vtable lookups

I have been toying with an idea for a programming language for a while now: It would essentially be C++ and Java-like in syntax, meant for systems programming (or really any programming that requires high-performance), but with, in my opinion, a more enjoyable syntax than C++. I was thinking about how I would handle virtual methods in hierarchical class structures (my language would not include multiple inheritance), and ways of avoiding vtable lookups. My question is twofold:

To my understanding, the reason vtable lookups are such a performance-hit (in time-critical scenarios like game-development, at least), is because it requires deferencing the objects vtable pointer, and this vtable is generally a cache-miss. Is this correct, or am I missing some part of the problem?
My idea for a partial solution is this: If the compiler can determine an object's type completely (ie. it cannot be a type derived from the type it thinks it is), and this object is passed to a function as an argument whose type is a superclass of the object's type, then the location of the virtual method called in the function can be passed as a sort of "hidden" argument, which is added at compile-time. Perhaps an example would help:

Consider the following pseudo-code for a class hierarchy:

class Animal {
    public void talk() { /* Generic animal noise... */ }
    // ...
}

class Dog extends Animal {
    public void talk() { /* Override of Animal::talk(). */ }
    // ...
}

void main() {
    Dog d = new Dog();
    doSomethingWithAnimal(d);
}

void doSomethingWithAnimal(Animal a) {
    // ...
    a.talk();
    // ....
}

Keep in mind that this is pseudocode, not C++ or Java or similar. Also, assume that the Animal argument is implicitly passed by reference, not value. Because the compiler can see that d is definitely of type Dog, it could translate the doSomethingWithAnimal definition into something like this:

void doSomethingWithAnimal(Animal a, methodptr talk = NULL) {
    // ...
    if ( talk != NULL ) {
        talk(a);
    } else {
        a.talk();
    }
    // ...
}

Then main would look be translated by the compiler to something like this:

void main() {
    Dog d = new Dog();
    doSomethingWithAnimal(d, Dog::talk);
}

Obviously this wouldn't completely eradicate the need for a vtable, and one would probably still need to be provided for the cases when the objects exact type cannot be determined, but what are your thoughts on this as a performance optimization? I plan on using registers to pass arguments whenever possible, and even if the arguments had to spill onto the stack, it is more likely that the methodptr argument on the stack will be a cache-hit than the vtable values, right? Any and all thoughts are much appreciated.

Solution

Re Q1: Cache utilization is only one part of the "problem" with virtual calls actually. The whole point of virtual functions, and late binding in general, is that the call site may call any implementation without changes. This mandates some indirection:

Indirection implies space and/or time overhead to resolve the indirection.
No matter how you do the indirection, and indirect call can only be as fast as a static call if the CPU has a good branch predictor and the call site is monomorphic (i.e., only one implementation is ever called). And I'm not even sure if a perfectly-predicted branch is as fast as a static branch on all hardware developers care about.
Not knowing the called function at compile time also inhibits optimization based on knowing the called function (inlining, but also loop invariant code motion and possibly more).

Your approach doesn't change that, and hence leaves most of the performance problems in place: It still wastes some time and space (only on additional arguments and branches, rather than vtable lookups), it doesn't permit inlining or other optimizations, and it doesn't remove indirect calls.

Re 2: It's kind of an interprocedural spin on devirtualization, which C++ compilers already do to some degree (locally, with limitations as described by @us2012 in the comments). There are a few "minor" problems with it, but it may be worth it if applied selectively. Otherwise, you generate a lot more code, pass a lot of additional arguments, do a lot of extra branches, and only gain very little or even a net loss.

I assume the main issue is that it doesn't solve most of the performance issues described above. Generating specialized functions (rather than one generic body) for subclasses, and other variations of the same theme, may help with that. But that generates additional code that has to justify itself with performance gains, and the general consensus is that such aggressive optimizations aren't worth it for most code even in performance-critial programs.

In particular, virtual call overhead only matters in benchmarks of that very same feature, or if you already optimized the ever-loving hell out of everything else and would need a huge lot of tiny indirect calls (an example from game development: Several virtual method calls per geometrical object for drawing or frustum culling). In most code, the virtual calls don't matter, or at least not enough to warrant further optimization attempts. In addition, this is only relevant for AOT compilers, as JIT compilers have other ways of dealing with these problems. Look up polymorphic inline caches, and note that tracing JIT compilers can trivially inline all calls, virtual or not.

In summary: vtables are already a fast and general way to implement virtual functions (if they can be used, which is the case here). It's unlikely you're be able to improve much on them, let alone notice the improvement, except perhaps in some rare cases. If you want to try it though, you could try writing an LLVM pass that does something like this (though you'd have to work on a lower level of abstraction).