Search code examples
clinuxlanguage-lawyercompiler-bug

What is the behavior of printing NULL with printf's %s specifier?


Came across an interesting interview question:

test 1:
printf("test %s\n", NULL);
printf("test %s\n", NULL);

prints:
test (null)
test (null)

test 2:
printf("%s\n", NULL);
printf("%s\n", NULL);
prints
Segmentation fault (core dumped)

Though this might run fine on some systems, at least mine is throwing a segmentation fault. What would be the best explanation of this behavior? Above code is in C.

Following is my gcc info:

deep@deep:~$ gcc --version
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3

Solution

  • First things first: printf is expecting a valid (i.e. non-NULL) pointer for its %s argument so passing it a NULL is officially undefined. It may print "(null)" or it may delete all files on your hard drive--either is correct behavior as far as ANSI is concerned (at least, that's what Harbison and Steele tells me.)

    That being said, yeah, this is really wierd behavior. It turns out that what's happening is that when you do a simple printf like this:

    printf("%s\n", NULL);
    

    gcc is (ahem) smart enough to deconstruct this into a call to puts. The first printf, this:

    printf("test %s\n", NULL);
    

    is complicated enough that gcc will instead emit a call to real printf.

    (Notice that gcc emits warnings about your invalid printf argument when you compile. That's because it long ago developed the ability to parse *printf format strings.)

    You can see this yourself by compiling with the -save-temps option and then looking through the resulting .s file.

    When I compiled the first example, I got:

    movl    $.LC0, %eax
    movl    $0, %esi
    movq    %rax, %rdi
    movl    $0, %eax
    call    printf      ; <-- Actually calls printf!
    

    (Comments were added by me.)

    But the second one produced this code:

    movl    $0, %edi    ; Stores NULL in the puts argument list
    call    puts        ; Calls puts
    

    Note that this optimization is correct, i.e. it produces the same result for valid strings; notably puts prints a newline character after the string.