Need some advanced explanations about BOF

So I was following a tutorial about buffer overflow with the following code:

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>

int main(int argc, char **argv)
{
  volatile int modified;
  char buffer[64];

 modified = 0;
  gets(buffer);

  if(modified != 0) {
      printf("you have changed the 'modified' variable\n");
  } else {
      printf("Try again?\n");
  }
}

I then compile it with gcc and additionnally run beforehand sudo sysctl -w kernel.randomize_va_space=0 to prevent random memory and allow the stack smashing (buffer overflow) exploit

gcc protostar.c -g -z execstack -fno-stack-protector -o protostar

-g is to allow debugging in gdb ('list main')
-z execstack -fno-stack-protector is to remove the stack protection

and then execute it:

python -c 'print "A"*76' | ./protostar

Try again?

python -c 'print "A"*77' | ./protostar

you have changed the 'modified' variable

So I do not understand why the buffer overflow occurs with 77 while it should have been 65, so it makes a 12 bits difference (3 bytes). I wonder the reason why if anyone has a clear explanation ?

Also it remains this way from 77 to 87:

python -c 'print "A"*87' | ./protostar
you have changed the 'modified' variable

And from 88 it adds a segfault:

python -c 'print "A"*88' | ./protostar
you have changed the 'modified' variable
Segmentation fault (core dumped)

Regards

Solution

To fully understand what's happening, it's first important to make note of how your program is laying out memory.

From your comment, you have that for this particular run, memory for buffer starts at 0x7fffffffdf10 and then modified starts at 0x7fffffffdf5c (although randomize_va_space may keep this consistent across runs, but I'm not quite sure).

So you have something like this:

0x7fffffffdf10            0x7fffffffdf50      0x7fffffffdf5c
↓                         ↓                   ↓
(64 byte buffer)..........(some 12 bytes).....(modified)....

Essentially, you have the 64 character buffer, then when that ends, there's 12 bytes that are used for some other stack variable (likely 4 bytes argc and 8 bytes for argv), and then modified comes after, precisely starting 64+12 = 76 bytes after the buffer starts.

Therefore, when you write between 65 and 76 characters into the 64 byte buffer, it goes past and starts writing into those 12 bytes that are in-between the buffer and modified. When you start writing the 77th character, it starts overwriting what's in modified which causes you to see the "you have changed the 'modified' variable" message.

You asked also "why does it work if I go up to 87 and then at 88 there's a segfault? The answer is that because it's undefined behavior, as soon as you start writing into invalid memory and the kernel recognizes it, it'll immediately kill your process because you are trying to read/write memory you don't have access to.

Note that you should almost never use gets in practice and this is a big reason, since you don't know exactly how many bytes you will be reading so there's a chance to overwrite. Also note that the behavior you're seeing is not the same behavior I'm seeing on my machine when I run it. This is normal, and that's because it's undefined behavior. There are no guarantees to what will happen when you run it. On my machine, modified actually comes before buffer in memory, so I don't ever see the modified variable get overwritten. I think this is a good learning example to understand why undefined behavior like this is just so unpredictable.