Search code examples
c++gccx86clangcompiler-optimization

Why does GCC generate a faster program than Clang in this recursive Fibonacci code?


This is the code that I tested:

#include <iostream>
#include <chrono>
using namespace std;

#define CHRONO_NOW                  chrono::high_resolution_clock::now()
#define CHRONO_DURATION(first,last) chrono::duration_cast<chrono::duration<double>>(last-first).count()

int fib(int n) {
    if (n<2) return n;
    return fib(n-1) + fib(n-2);
}

int main() {
    auto t0 = CHRONO_NOW;
    cout << fib(45) << endl;
    cout << CHRONO_DURATION(t0, CHRONO_NOW) << endl;
    return 0;
}

Of course, there are much faster ways of calculating Fibonacci numbers, but this is a good little stress test that focuses on recursive function calls. There's nothing else to the code, other than the use of chrono for measuring time.

First I ran the test a couple of times in Xcode on OS X (so that's clang), using -O3 optimization. It took about 9 seconds to run.

Then, I compiled the same code with gcc (g++) on Ubuntu (using -O3 again), and that version only took about 6.3 seconds to run! Also, I was running Ubuntu inside VirtualBox on my mac, which could only affect the performance negatively, if at all.

So there you go:

  • Clang on OS X -> ~9 secs
  • gcc on Ubuntu in VirtualBox -> ~6.3 secs.

I know that these are completely different compilers so they do stuff differently, but all the tests I've seen featuring gcc and clang only showed much less of a difference, and in some cases, the difference was the other way around (clang being faster).

So is there any logical explanation why gcc beats clang by miles in this particular example?


Solution

  • I wouldn't say that gcc beats clang by miles. In my opinion, the performance difference (6.3 seconds vs 9 seconds) is rather small. On my FreeBSD system, clang requires 26.12 seconds and gcc requires 10.55 seconds.

    However, the way to debug this is to use g++ -S and clang++ -S to get the assembly output.

    I tested this on my FreeBSD system. The assembly language files are too long to post here, but it appears that gcc performs multiple levels of inlining in the Fibonacci calculation function (there were 20 fib() calls in there!) whereas clang simply calls fib(n-1) and fib(n-2) with no levels of inlining.

    By the way, my gcc version was 4.2.1 20070831 patched [FreeBSD] and clang version was 3.1 (branches/release_31 156863) 20120523. These were the versions that come with the FreeBSD 9.1-RELEAESE base system. The CPU is AMD Turion II Neo N40L Dual-Core Processor (1497.54-MHz).