Search code examples
performancelanguage-agnosticcompiler-construction

Why are compilers so stupid?


I always wonder why compilers can't figure out simple things that are obvious to the human eye. They do lots of simple optimizations, but never something even a little bit complex. For example, this code takes about 6 seconds on my computer to print the value zero (using java 1.6):

int x = 0;
for (int i = 0; i < 100 * 1000 * 1000 * 1000; ++i) {
    x += x + x + x + x + x;
}
System.out.println(x);

It is totally obvious that x is never changed so no matter how often you add 0 to itself it stays zero. So the compiler could in theory replace this with System.out.println(0).

Or even better, this takes 23 seconds:

public int slow() {
   String s = "x";
   for (int i = 0; i < 100000; ++i) {
       s += "x";
   }
   return 10;
}

First the compiler could notice that I am actually creating a string s of 100000 "x" so it could automatically use s StringBuilder instead, or even better directly replace it with the resulting string as it is always the same. Second, It does not recognize that I do not actually use the string at all, so the whole loop could be discarded!

Why, after so much manpower is going into fast compilers, are they still so relatively dumb?

EDIT: Of course these are stupid examples that should never be used anywhere. But whenever I have to rewrite a beautiful and very readable code into something unreadable so that the compiler is happy and produces fast code, I wonder why compilers or some other automated tool can't do this work for me.


Solution

  • Oh, I don't know. Sometimes compilers are pretty smart. Consider the following C program:

    #include <stdio.h>  /* printf() */
    
    int factorial(int n) {
       return n == 0 ? 1 : n * factorial(n - 1);
    }
    
    int main() {
       int n = 10;
    
       printf("factorial(%d) = %d\n", n, factorial(n));
    
       return 0;
    }
    

    On my version of GCC (4.3.2 on Debian testing), when compiled with no optimizations, or -O1, it generates code for factorial() like you'd expect, using a recursive call to compute the value. But on -O2, it does something interesting: It compiles down to a tight loop:

        factorial:
       .LFB13:
               testl   %edi, %edi
               movl    $1, %eax
               je  .L3
               .p2align 4,,10
               .p2align 3
       .L4:
               imull   %edi, %eax
               subl    $1, %edi
               jne .L4
       .L3:
               rep
               ret
    

    Pretty impressive. The recursive call (not even tail-recursive) has been completely eliminated, so factorial now uses O(1) stack space instead of O(N). And although I have only very superficial knowledge of x86 assembly (actually AMD64 in this case, but I don't think any of the AMD64 extensions are being used above), I doubt that you could write a better version by hand. But what really blew my mind was the code that it generated on -O3. The implementation of factorial stayed the same. But main() changed:

        main:
       .LFB14:
               subq    $8, %rsp
       .LCFI0:
               movl    $3628800, %edx
               movl    $10, %esi
               movl    $.LC0, %edi
               xorl    %eax, %eax
               call    printf
               xorl    %eax, %eax
               addq    $8, %rsp
               ret
    

    See the movl $3628800, %edx line? gcc is pre-computing factorial(10) at compile-time. It doesn't even call factorial(). Incredible. My hat is off to the GCC development team.

    Of course, all the usual disclaimers apply, this is just a toy example, premature optimization is the root of all evil, etc, etc, but it illustrates that compilers are often smarter than you think. If you think you can do a better job by hand, you're almost certainly wrong.

    (Adapted from a posting on my blog.)