Search code examples
c++performancedictionaryvtable

Which performs faster? vtable look-up with N derived types, or std::map look-up with N elements?


#include <iostream>
#include <map>
#include <ctime>

struct Base { virtual void foo() {} };
struct A : Base { void foo() {} };
struct B : Base { void foo() {} };
struct C : Base { void foo() {} };

A* protoA = new A;  B* protoB = new B;  C* protoC = new C;

enum Tag {a, b, c};

std::map<Tag, Base*> protoMap = { {a, protoA}, {b, protoB}, {c, protoC} };

void goo(Base* base) {base->foo();}

void hoo(Tag tag) {protoMap[tag]->foo();}

struct Timer {
    const std::clock_t begin;
    Timer() : begin (std::clock()) {}
    ~Timer() {
        const std::clock_t end = std::clock();
        std::cout << double (end - begin) / CLOCKS_PER_SEC << " seconds." << std::endl;
    };
};

int main() {
    const long N = 10000000;
    {
        Timer timer;
        for (int i = 0; i < N; i++)
            goo(new C);  // using vtable
    }  // 0.445 seconds.
    {
        Timer timer;
        for (int i = 0; i < N; i++)
            hoo(c);  // using map
    }  // 0.605 seconds.
    std::cin.get();
}

But my test only uses three derived classes, and I don't know how to define thousands of derived classes to benchmark properly. Does anyone know the answer already so that I don't have to think of how to run better tests?

I'm in a situation where I can easily use a map, but to attempt to improve the performance with vtable look-up would require redesigning a lot of stuff I already have (add a new data member to a class, define a new constructor, implement a visitor pattern, etc...).


Solution

  • Without a doubt, the fastest way to go is the vtable.

    Experimentation:

    You have to run your benchmark in similar conditions. It's not reasonable to allocate a new element for every call for the vtable benchmark, but not to allocate anything in the map approach.

    I ran the test using a vtable approach using a pointer to an already allocated element. I got 16 ms for the vtable and 78 ms for the map approach.

    If you want to go further and rally compare the 1000th derivates, you have to work with a recursive template.

    Explanation of the result:

    The vtable lookup generally uses a fixed offset in the vtable of the object, and not a search. So even with 1000 derivations, you'll have a constant reaction time in O(1).

    The map will need to do an expensive search operation on the key, which is O(log(n)).

    Additional remarks:

    The map implementation that you propose will not work in reallife, because you always call the member function for ONE SINGLE predetermined object: the one you stored in the map. The virtual funtion will work with any object of the family.