Why is execution-time method resolution faster than compile-time resolution?

At school, we about virtual functions in C++, and how they are resolved (or found, or matched, I don't know what the terminology is -- we're not studying in English) at execution time instead of compile time. The teacher also told us that compile-time resolution is much faster than execution-time (and it would make sense for it to be so). However, a quick experiment would suggest otherwise. I've built this small program:

#include <iostream>
#include <limits.h>

using namespace std;

class A {
    public:
    void f() {
        // do nothing
    }
};

class B: public A {
    public:
    void f() {
        // do nothing
    }
};

int main() {
    unsigned int i;
    A *a = new B;
    for (i=0; i < UINT_MAX; i++) a->f();
    return 0;
}

I compiled the program above and named it normal. Then, I modified A to look like this:

class A {
    public:
    virtual void f() {
        // do nothing
    }
};

Compiled and named it virtual. Here are my results:

[felix@the-machine C]$ time ./normal 

real    0m25.834s
user    0m25.742s
sys 0m0.000s
[felix@the-machine C]$ time ./virtual 

real    0m24.630s
user    0m24.472s
sys 0m0.003s
[felix@the-machine C]$ time ./normal 

real    0m25.860s
user    0m25.735s
sys 0m0.007s
[felix@the-machine C]$ time ./virtual 

real    0m24.514s
user    0m24.475s
sys 0m0.000s
[felix@the-machine C]$ time ./normal 

real    0m26.022s
user    0m25.795s
sys 0m0.013s
[felix@the-machine C]$ time ./virtual 

real    0m24.503s
user    0m24.468s
sys 0m0.000s

There seems to be a steady ~1 second difference in favor of the virtual version. Why is this?

Relevant or not: dual-core pentium @ 2.80Ghz, no extra applications running between two tests. Archlinux with gcc 4.5.0. Compiling normally, like:

$ g++ test.cpp -o normal

Also, -Wall doesn't spit out any warnings, either.

Edit: I have separated my program into A.cpp, B.cpp and main.cpp. Also, I made the f() (both A::f() and B::f()) function actually do something (x = 0 - x where x is a public int member of A, initialized with 1 in A::A()). Compiled this into six versions, here are my final results:

[felix@the-machine poo]$ time ./normal-unoptimized 

real    0m31.172s
user    0m30.621s
sys 0m0.033s
[felix@the-machine poo]$ time ./normal-O2

real    0m2.417s
user    0m2.363s
sys 0m0.007s
[felix@the-machine poo]$ time ./normal-O3

real    0m2.495s
user    0m2.447s
sys 0m0.000s
[felix@the-machine poo]$ time ./virtual-unoptimized 

real    0m32.386s
user    0m32.111s
sys 0m0.010s
[felix@the-machine poo]$ time ./virtual-O2

real    0m26.875s
user    0m26.668s
sys 0m0.003s
[felix@the-machine poo]$ time ./virtual-O3

real    0m26.905s
user    0m26.645s
sys 0m0.017s

Unoptimized is still 1 second faster when virtual, which I find a bit peculiar. But this was a nice experiment and would like to thank all of you for your answers!

Solution

Profiling unoptimised code is pretty much meaningless. Use -O2 to produce a meaningful result. Using -O3 may result in even faster code, but it may not generate a realistic outcome unless you compile A::f and B::f separately to main (i.e., in separate compilation units).

Based on the feedback, perhaps even -O2 is too aggressive. The 2 ms result is because the compiler optimised the loop away entirely. Direct calls aren't that fast; in fact, it ought to be very difficult to observe any appreciable difference. Move the implementations of f into a separate compilation unit to get real numbers. Define the classes in a .h, but define A::f and B::f in their own .cc file.