Why is the following code susceptible to heap overflow attack

I'm new to cyber security, and I am trying to understand why the following code is susceptible to a heap overflow attack...

struct data {
 char name[128];
};
struct fp {
 int (*fp)();
};
void printName() {
 printf("Printing function...\n");
}
int main(int argc, char **argv) {
 struct data *d;
 struct fp *f;
 d = malloc(sizeof(struct data));
 f = malloc(sizeof(struct fp));
 f->fp = printName;
 read(stdin,d->name,256);

 f->fp();
}

Is it because of the read(stdin, d->name, 256) as it is reading more than the allocated buffer size of 128 for char name in struct data?

Any help would be great

Solution

A heap overflow attack is similar to a buffer overflow attack, except instead of overwriting values in the stack, the attacker tramples data in the heap.

Notice in your code that there are two dynamically allocated values:

d = malloc(sizeof(struct data));
f = malloc(sizeof(struct fp));

So d now holds the address of a 128-byte chunk of memory in the heap, while f holds the address of an 8-byte (assuming a 64-bit machine) chunk of memory. Theoretically, these two addresses could be nowhere near each other, but since they're both relatively small, it's likely that the OS allocated one larger chunk of contiguous memory and gave you pointers that are next to each other.

So once you run f->fp = printName;, your heap looks something like this:

Note: Each row is 8 bytes wide

     |                        |
     +------------------------+
f -> | <Address of printName> |
     +------------------------+
     |           ▲            |
     |      11 more rows      |
     |       not shown        |
     |                        |
d -> |  <Uninitialized data>  |
     +------------------------+
     |                        |

Your initial assessment of where the vulnerability comes from is correct. d points to 128 bytes of memory, but you let the user write 256 bytes to that area. C has no mechanism for bounds checking, so the compiler is perfectly happy to let you write past the edge of the d memory. If f is right next to d, you'll fall over the edge of d and into f. Now, an attacker has the ability to modify the contents of f just by writing to d.

To exploit this vulnerability, an attacker feeds the address of some code that they've written to d by repeating it for all 256 bytes of input. If the attacker has stored some malicious code at address 0xbadc0de, they feed in 0xbadc0de to stdin 32 times (256 bytes) so that the heap gets overwritten.

     |  0xbadc0de  |
     +-------------+
f -> |  0xbadc0de  |
     +-------------+
     |     ...     |
     |  0xbadc0de  |
     |  0xbadc0de  |
d -> |  0xbadc0de  |
     +-------------+
     |             |

Then, your code reaches the line

f->fp();

which is a function call using the address stored in f. The machine goes to memory location f and retrieves the value stored there, which is now the address of the attacker's malicious code. Since we're calling it as a function, the machine now jumps to that address and begins executing the code stored there, and now you've got a lovely arbitrary code execution attack vector on your hands.