I was just learning about format string vulnerabilities that makes me ask this question
Consider the following simple program:
#include<stdio.h>
void main(int argc, char **argv)
{
char *s="SomeString";
printf(argv[1]);
}
Now clearly, this code is vulnerable to a format String vulnerability. I.e. when the command line argument is %s, then the value SomeString is printed since printf pops the stack once.
What I dont understand is the structure of the stack when printf is called
In my head I imagine the stack to be as follows:
grows from left to right ----->
main() ---> printf()-->
RET to libc_main | address of 's' | current registers| ret ptr to main | ptr to format string|
if this is the case, how does inputting %s to the program, cause the value of s to be popped ?
(OR) If I am totally wrong about the stack structure , please correct me
The stack contents depends a lot on the following:
This is what I get by compiling your tiny program with x86 mingw using gcc stk.c -S -o stk.s
:
.file "stk.c"
.def ___main; .scl 2; .type 32; .endef
.section .rdata,"dr"
LC0:
.ascii "SomeString\0"
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB6:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $32, %esp
call ___main
movl $LC0, 28(%esp)
movl 12(%ebp), %eax
addl $4, %eax
movl (%eax), %eax
movl %eax, (%esp)
call _printf
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE6:
.def _printf; .scl 2; .type 32; .endef
And this is what I get using gcc stk.c -S -O2 -o stk.s
, that is, with optimizations enabled:
.file "stk.c"
.def ___main; .scl 2; .type 32; .endef
.section .text.startup,"x"
.p2align 2,,3
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB7:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
call ___main
movl 12(%ebp), %eax
movl 4(%eax), %eax
movl %eax, (%esp)
call _printf
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE7:
.def _printf; .scl 2; .type 32; .endef
As you can see, in the latter case there's no pointer to "SomeString" on the stack. In fact, the string isn't even present in the compiled code.
In this simple code there are no registers saved on the stack because there aren't any variables allocated to registers that need to be preserved across the call to printf()
.
So, the only things you get on the stack here are the string pointer (optionally), unused space due to stack alignment (andl $-16, %esp
+ subl $32, %esp
align the stack and allocate space for local variables, none here), the printf()
's parameter, the return address for returning from printf()
back to main()
.
In the former case the pointer to "SomeString" and the printf()
's parameter (value of argv[1]
) are quite far away from one another:
movl $LC0, 28(%esp) ; address of "SomeString" is at esp+28
movl 12(%ebp), %eax
addl $4, %eax
movl (%eax), %eax
movl %eax, (%esp) ; address of a copy of argv[1] is at esp
call _printf
To make the two addresses stored one right after the other on the stack, if that's what you want, you'd need to play with the code, compilation/optimization options or use a different compiler.
Or you could supply a format string in argv[1]
such that printf()
would reach it. You could, for example, include a number of fake parameters in the format string.
For example, if I compile this piece of code using gcc stk.c -o stk.exe
and run it as stk.exe %u%u%u%u%u%u%s
, I'll get the following output from it:
4200532268676042006264200532880015253SomeString
All of this is pretty hacky and it's not trivial to make it work right.