I have been toying with an idea for a programming language for a while now: It would essentially be C++ and Java-like in syntax, meant for systems programming (or really any programming that requires high-performance), but with, in my opinion, a more enjoyable syntax than C++. I was thinking about how I would handle virtual methods in hierarchical class structures (my language would not include multiple inheritance), and ways of avoiding vtable lookups. My question is twofold:
Consider the following pseudo-code for a class hierarchy:
class Animal {
public void talk() { /* Generic animal noise... */ }
// ...
}
class Dog extends Animal {
public void talk() { /* Override of Animal::talk(). */ }
// ...
}
void main() {
Dog d = new Dog();
doSomethingWithAnimal(d);
}
void doSomethingWithAnimal(Animal a) {
// ...
a.talk();
// ....
}
Keep in mind that this is pseudocode, not C++ or Java or similar. Also, assume that the Animal argument is implicitly passed by reference, not value. Because the compiler can see that d
is definitely of type Dog
, it could translate the doSomethingWithAnimal
definition into something like this:
void doSomethingWithAnimal(Animal a, methodptr talk = NULL) {
// ...
if ( talk != NULL ) {
talk(a);
} else {
a.talk();
}
// ...
}
Then main
would look be translated by the compiler to something like this:
void main() {
Dog d = new Dog();
doSomethingWithAnimal(d, Dog::talk);
}
Obviously this wouldn't completely eradicate the need for a vtable, and one would probably still need to be provided for the cases when the objects exact type cannot be determined, but what are your thoughts on this as a performance optimization? I plan on using registers to pass arguments whenever possible, and even if the arguments had to spill onto the stack, it is more likely that the methodptr argument on the stack will be a cache-hit than the vtable values, right? Any and all thoughts are much appreciated.
Re Q1: Cache utilization is only one part of the "problem" with virtual calls actually. The whole point of virtual
functions, and late binding in general, is that the call site may call any implementation without changes. This mandates some indirection:
Your approach doesn't change that, and hence leaves most of the performance problems in place: It still wastes some time and space (only on additional arguments and branches, rather than vtable lookups), it doesn't permit inlining or other optimizations, and it doesn't remove indirect calls.
Re 2: It's kind of an interprocedural spin on devirtualization, which C++ compilers already do to some degree (locally, with limitations as described by @us2012 in the comments). There are a few "minor" problems with it, but it may be worth it if applied selectively. Otherwise, you generate a lot more code, pass a lot of additional arguments, do a lot of extra branches, and only gain very little or even a net loss.
I assume the main issue is that it doesn't solve most of the performance issues described above. Generating specialized functions (rather than one generic body) for subclasses, and other variations of the same theme, may help with that. But that generates additional code that has to justify itself with performance gains, and the general consensus is that such aggressive optimizations aren't worth it for most code even in performance-critial programs.
In particular, virtual call overhead only matters in benchmarks of that very same feature, or if you already optimized the ever-loving hell out of everything else and would need a huge lot of tiny indirect calls (an example from game development: Several virtual method calls per geometrical object for drawing or frustum culling). In most code, the virtual calls don't matter, or at least not enough to warrant further optimization attempts. In addition, this is only relevant for AOT compilers, as JIT compilers have other ways of dealing with these problems. Look up polymorphic inline caches, and note that tracing JIT compilers can trivially inline all calls, virtual or not.
In summary: vtables are already a fast and general way to implement virtual functions (if they can be used, which is the case here). It's unlikely you're be able to improve much on them, let alone notice the improvement, except perhaps in some rare cases. If you want to try it though, you could try writing an LLVM pass that does something like this (though you'd have to work on a lower level of abstraction).