I just checked the size of a class containing dozens of virtual methods with g++ (4.7), because I had heard pointers are used for virtual methods and I thought that would be a terrible implementation, as it would take up 80 bytes for each instance of a class with a mere 10 virtual methods that way on my system.
To my relief, sizeof(<insert typename here>)
returned only 8 bytes, the size of a pointer on my system. I assume this means that it stores a pointer to the vtable, rather than each method and that I simply misunderstood what people were saying (or perhaps that most compilers are stupid).
However, before I finally tested this, I had been struggling with using virtual methods as pointers the way I expected them to work. I noticed that the address was in fact a relatively very low number, often under 100 and with a difference of 8 bytes compared to other ones, so I assumed it was some sort of index for an array. And then I went pondering about how I would implement vtables myself, and that would not be using a pointer, as the results of my test clearly show. I was surprised to see it use a whole 8 bytes (I verified whether or not it was just padding by inserting a char field, which returned 16 bytes with sizeof).
Instead, I would implement this by storing an array index (for example 4 bytes, or even 2 if 65536 or less classes with virtual methods are used) which would be searched for in a look-up table containing pointers to the vtables, and find it that way. So why is a pointer stored? For performance reasons, or did they simply reuse the code for 32-bit operating systems (as it would make no difference in memory size there)?
Thank you in advance.
edit:
Someone requested me to calculate the actual memory saved, and I decided to make a code example. Unfortunately, it became quite big (they requested me to use 10 virtual methods in both), but I tested it and it actually works. Here it comes:
#include <cstdio>
#include <cstdlib>
/* For the singleton lovers in this community */
class VirtualTableManager
{
unsigned capacity, count;
void*** vtables;
public:
~VirtualTableManager() {
delete vtables;
}
static VirtualTableManager& getInstance() {
static VirtualTableManager instance;
return instance;
}
unsigned addElement(void** vtable) {
if (count == capacity)
{
vtables = (void***) realloc(vtables, (capacity += 0x2000) * sizeof(void**)); /* Reserves an extra 64KiB of pointers */
}
vtables[count] = vtable;
return count++;
}
void** getElement(unsigned index) {
return index < capacity ? vtables[index] : 0; /* Just in case: "Hey guys, let's misuse the API!" */
}
private:
VirtualTableManager() : capacity(0), count(0), vtables(0) { }
VirtualTableManager(const VirtualTableManager&);
void operator =(const VirtualTableManager&);
};
class Real
{
public:
short someField; /* This is required to show the difference, because of padding */
Real() : someField(0) { }
virtual ~Real() {
printf("Real::~Real()\n");
}
virtual void method0() {
printf("Real::method0()\n");
}
virtual void method1(short argument) {
someField = argument;
}
virtual short method2() {
return someField;
}
virtual void method3() { }
virtual void method4() { }
virtual void method5() { }
virtual void method6() { }
virtual void method7() { }
virtual void method8() { }
};
class Fake
{
static void** vtable;
static unsigned classVIndex; /* Don't know what to call it, please forgive me for the lame identifier */
public:
unsigned instanceVIndex;
short someField;
Fake() : instanceVIndex(classVIndex), someField(0) { }
~Fake() {
reinterpret_cast<void (*)(Fake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[9])(this);
}
void method0() {
reinterpret_cast<void (*)(Fake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[0])(this);
}
void method1(short argument) {
reinterpret_cast<void (*)(Fake*, short argument)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[1])(this, argument);
}
short method2() {
return reinterpret_cast<short (*)(Fake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[2])(this);
}
void method3() {
reinterpret_cast<void (*)(Fake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[3])(this);
}
void method4() {
reinterpret_cast<void (*)(Fake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[4])(this);
}
void method5() {
reinterpret_cast<void (*)(Fake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[5])(this);
}
void method6() {
reinterpret_cast<void (*)(Fake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[6])(this);
}
void method7() {
reinterpret_cast<void (*)(Fake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[7])(this);
}
void method8() {
reinterpret_cast<void (*)(Fake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[8])(this);
}
protected:
Fake(unsigned instanceVIndex, short someField)
: instanceVIndex(instanceVIndex), someField(someField) { }
/* The 'this' keyword is an automatically passed pointer, so I'll just manually pass it and identify it as 'self' (thank you, lua, I would have used something like 'vthis', which would be boring and probably incorrect) */
static void vmethod0(Fake* self) {
printf("Fake::vmethod0(%p)\n", self);
}
static void vmethod1(Fake* self, short argument) {
self->someField = argument;
}
static short vmethod2(Fake* self) {
return self->someField;
}
static void vmethod3(Fake* self) { }
static void vmethod4(Fake* self) { }
static void vmethod5(Fake* self) { }
static void vmethod6(Fake* self) { }
static void vmethod7(Fake* self) { }
static void vmethod8(Fake* self) { }
static void vdestructor(Fake* self) {
printf("Fake::vdestructor(%p)\n", self);
}
};
class DerivedFake : public Fake
{
static void** vtable;
static unsigned classVIndex;
public:
DerivedFake() : Fake(classVIndex, 0) { }
~DerivedFake() {
reinterpret_cast<void (*)(DerivedFake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[1])(this);
}
void method0() {
reinterpret_cast<void (*)(DerivedFake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[0])(this);
}
protected:
DerivedFake(unsigned instanceVIndex, short someField)
: Fake(instanceVIndex, someField) { }
static void vmethod0(DerivedFake* self) {
printf("DerivedFake::vmethod0(%p)\n", self);
}
static void vdestructor(DerivedFake* self) {
printf("DerivedFake::vdestructor(%p)\n", self);
Fake::vdestructor(self); /* call parent destructor */
}
};
/* Make the vtable */
void** Fake::vtable = (void*[]) {
(void*) &Fake::vmethod0, (void*) &Fake::vmethod1,
(void*) &Fake::vmethod2, (void*) &Fake::vmethod3,
(void*) &Fake::vmethod4, (void*) &Fake::vmethod5,
(void*) &Fake::vmethod6, (void*) &Fake::vmethod7,
(void*) &Fake::vmethod8, (void*) &Fake::vdestructor
};
/* Store the vtable and get the look-up index */
unsigned Fake::classVIndex = VirtualTableManager::getInstance().addElement(Fake::vtable);
/* Do the same for derived class */
void** DerivedFake::vtable = (void*[]) {
(void*) &DerivedFake::vmethod0, (void*) &Fake::vmethod1,
(void*) &Fake::vmethod2, (void*) &Fake::vmethod3,
(void*) &Fake::vmethod4, (void*) &Fake::vmethod5,
(void*) &Fake::vmethod6, (void*) &Fake::vmethod7,
(void*) &Fake::vmethod8, (void*) &DerivedFake::vdestructor
};
unsigned DerivedFake::classVIndex = VirtualTableManager::getInstance().addElement(DerivedFake::vtable);
int main_virtual(int argc, char** argv)
{
printf("size of 100 instances of Real including padding is %lu bytes\n"
"size of 100 instances of Fake including padding is %lu bytes\n",
sizeof(Real[100]), sizeof(Fake[100]));
Real *real = new Real;
Fake *fake = new Fake;
Fake *derived = new DerivedFake;
real->method1(123);
fake->method1(456);
derived->method1(789);
printf("real::method2() = %hi\n"
"fake::method2() = %hi\n"
"derived::method2() = %hi\n", real->method2(), fake->method2(), derived->method2());
real->method0();
fake->method0();
derived->method0();
delete real;
delete fake;
delete derived;
return 0;
}
Fear not, I do not normally put the definition in classes like that. I just did it here to hopefully improve readability. Anyway, the output:
size of 100 instances of Real including padding is 1600 bytes
size of 100 instances of Fake including padding is 800 bytes
real::method2() = 123
fake::method2() = 456
derived::method2() = 789
Real::method0()
Fake::vmethod0(0x1bd8040)
DerivedFake::vmethod0(0x1bd8060)
Real::~Real()
Fake::vdestructor(0x1bd8040)
DerivedFake::vdestructor(0x1bd8060)
Fake::vdestructor(0x1bd8060)
It might not be thread-safe, might contain a fearsome legion of bugs, and might also be relatively inefficient, but I hope it demonstrates my concept. It was tested on 64-bit Ubuntu with g++-4.7. I doubt there is any size benefit on 32-bit systems, and since I save less than a word (4 bytes, so much for that!) I had to put a field in there to make the effects show. Feel free to benchmark the speed though (please optimize it first if you do, I rushed this) or test the effects on other architectures/platforms and with other compilers (I'd like to see the results, so please share them if you do). Something similar may be useful when someone finds the need to make a 128/256-bit platform, creates a processor which has very limited memory support but incredible speed or with compilers that use like 21 bytes for the vtable on each instance.
edit:
Whoops, the code example was a derp. Fixed it.
One challenge with an array-based vtable is how you would link together several compiled source files. If each compiled file stored its own table, the linker would have to combine those tables together when producing the final binary. This increases the complexity of the linker, which now has to be made aware of this new C++-specific detail.
Additionally, the byte-saving techniques you described would be tricky to get right with multiple compilation units. What if you have two source files, each of which has few enough classes to use two bytes per vtable index, but which combined now need three bytes? In that case, the linker would have to rewrite the object files based on the new object size.
Additionally, this new system would not interact well with dynamic linking. If you had a separate object file that was linked in at runtime, you would have two or more global tables of vtables. The generated object code would then have to take this into account, which would increase code generator complexity.
Finally, there's alignment issues. Using two or four bytes for the index when the word size is eight bytes might degrade program performance if it offset all the other fields of the object. In fact, its entirely possible that g++ only uses four bytes, but then pads to eight.
In short, there is no reason why you couldn't do this optimization, but it comes at a significant implementation complexity and (possibly) at a runtike cost. That said, it's a very clever idea!
Hope this helps!