Search code examples
cgccassemblycompiler-optimization

Compiler changes printf to puts


Consider the following code:

#include <stdio.h>

void foo() {
    printf("Hello world\n");
}

void bar() {
    printf("Hello world");
}

The assembly produced by both these two functions is:

.LC0:
        .string "Hello world"
foo():
        mov     edi, OFFSET FLAT:.LC0
        jmp     puts
bar():
        mov     edi, OFFSET FLAT:.LC0
        xor     eax, eax
        jmp     printf

Now I know the difference between puts and printf, but I find this quite interesting that gcc is able to introspect the const char* and figure out whether to call printf or puts.

Another interesting thing is that in bar, compiler zero'ed out the return register (eax) even though it is a void function. Why did it do that there and not in foo?

Am I correct in assuming that compiler 'introspected my string', or there is another explanation of this?


Solution

  • Am I correct in assuming that compiler 'introspected my string', or there is another explanation of this?

    Yes, this is exactly what happens. It's a pretty simple and common optimization done by the compiler.

    Since your first printf() call is just:

    printf("Hello world\n");
    

    It's equivalent to:

    puts("Hello world");
    

    Since puts() does not need to scan and parse the string for format specifiers, it's quite faster than printf(). The compiler notices that your string ends with a newline and does not contain format specifiers, and therefore automatically converts the call.

    This also saves a bit of space, since now only one string "Hello world" needs to be stored in the resulting binary.

    Note that this is not possible in general for calls of the form:

    printf(some_var);
    

    If some_var is not a simple constant string, the compiler cannot know if it ends in \n.

    Other common optimizations are:

    • strlen("constant string") might get evaluated at compile time and converted into a number.
    • memmove(location1, location2, sz) might get transformed into memcpy() if the compiler is sure that location1 and location2 don't overlap.
    • memcpy() of small sizes can be converted in a single mov instruction, and even if the size is larger the call can sometimes be inlined to be faster.

    Another interesting thing is that in bar, compiler zero'ed out the return register (eax) even though it is a void function. Why did it do that there and not in foo?

    See here: Why is %eax zeroed before a call to printf?


    Related interesting posts