Search code examples
cassemblystackcalling-convention

Stack View when printf is called?


I was just learning about format string vulnerabilities that makes me ask this question

Consider the following simple program:

#include<stdio.h>
void main(int argc, char **argv)
{
char *s="SomeString";
printf(argv[1]);
}

Now clearly, this code is vulnerable to a format String vulnerability. I.e. when the command line argument is %s, then the value SomeString is printed since printf pops the stack once.

What I dont understand is the structure of the stack when printf is called

In my head I imagine the stack to be as follows:

grows from left to right ----->

main()                                                                  ---> printf()-->
RET to libc_main | address of 's' | current registers| ret ptr to main | ptr to format string| 

if this is the case, how does inputting %s to the program, cause the value of s to be popped ?

(OR) If I am totally wrong about the stack structure , please correct me


Solution

  • The stack contents depends a lot on the following:

    • the CPU
    • the compiler
    • the calling conventions (i.e. how parameters are passed in the registers and on the stack)
    • the code optimizations performed by the compiler

    This is what I get by compiling your tiny program with x86 mingw using gcc stk.c -S -o stk.s:

            .file   "stk.c"
            .def    ___main;        .scl    2;      .type   32;     .endef
            .section .rdata,"dr"
    LC0:
            .ascii "SomeString\0"
            .text
            .globl  _main
            .def    _main;  .scl    2;      .type   32;     .endef
    _main:
    LFB6:
            .cfi_startproc
            pushl   %ebp
            .cfi_def_cfa_offset 8
            .cfi_offset 5, -8
            movl    %esp, %ebp
            .cfi_def_cfa_register 5
            andl    $-16, %esp
            subl    $32, %esp
            call    ___main
            movl    $LC0, 28(%esp)
            movl    12(%ebp), %eax
            addl    $4, %eax
            movl    (%eax), %eax
            movl    %eax, (%esp)
            call    _printf
            leave
            .cfi_restore 5
            .cfi_def_cfa 4, 4
            ret
            .cfi_endproc
    LFE6:
            .def    _printf;        .scl    2;      .type   32;     .endef
    

    And this is what I get using gcc stk.c -S -O2 -o stk.s, that is, with optimizations enabled:

            .file   "stk.c"
            .def    ___main;        .scl    2;      .type   32;     .endef
            .section        .text.startup,"x"
            .p2align 2,,3
            .globl  _main
            .def    _main;  .scl    2;      .type   32;     .endef
    _main:
    LFB7:
            .cfi_startproc
            pushl   %ebp
            .cfi_def_cfa_offset 8
            .cfi_offset 5, -8
            movl    %esp, %ebp
            .cfi_def_cfa_register 5
            andl    $-16, %esp
            subl    $16, %esp
            call    ___main
            movl    12(%ebp), %eax
            movl    4(%eax), %eax
            movl    %eax, (%esp)
            call    _printf
            leave
            .cfi_restore 5
            .cfi_def_cfa 4, 4
            ret
            .cfi_endproc
    LFE7:
            .def    _printf;        .scl    2;      .type   32;     .endef
    

    As you can see, in the latter case there's no pointer to "SomeString" on the stack. In fact, the string isn't even present in the compiled code.

    In this simple code there are no registers saved on the stack because there aren't any variables allocated to registers that need to be preserved across the call to printf().

    So, the only things you get on the stack here are the string pointer (optionally), unused space due to stack alignment (andl $-16, %esp + subl $32, %esp align the stack and allocate space for local variables, none here), the printf()'s parameter, the return address for returning from printf() back to main().

    In the former case the pointer to "SomeString" and the printf()'s parameter (value of argv[1]) are quite far away from one another:

            movl    $LC0, 28(%esp) ; address of "SomeString" is at esp+28
            movl    12(%ebp), %eax
            addl    $4, %eax
            movl    (%eax), %eax
            movl    %eax, (%esp) ; address of a copy of argv[1] is at esp
            call    _printf
    

    To make the two addresses stored one right after the other on the stack, if that's what you want, you'd need to play with the code, compilation/optimization options or use a different compiler.

    Or you could supply a format string in argv[1] such that printf() would reach it. You could, for example, include a number of fake parameters in the format string.

    For example, if I compile this piece of code using gcc stk.c -o stk.exe and run it as stk.exe %u%u%u%u%u%u%s, I'll get the following output from it:

    4200532268676042006264200532880015253SomeString
    

    All of this is pretty hacky and it's not trivial to make it work right.