This is from CMU course on computer systems. In the following example:
typedef struct {
int a[2];
double d; } struct_t;
double fun(int i) {
volatile struct_t s;
s.d = 3.14;
s.a[i] = 1073741824; /* Possibly out of bounds */
return s.d; }
fun(0) ➙ 3.14
fun(1) ➙ 3.14
fun(2) ➙ 3.1399998664856
fun(3) ➙ 2.00000061035156
fun(4) ➙ 3.14
fun(6) ➙ Segmentation fault
Professor explains that accessing fun(2) manipulates the bytes of double d
. However, I did not get: (a) why this manipulates double bytes starting fun(2)
, (b) how exactly each byte manipulating correlates to values like fun(2) ➙ 3.1399998664856
, fun(3) ➙ 2.00000061035156
etc up until fun(6)
, and (c) why does it reach critical state exactly at fun (6)
? For more reference on my question, see here slide number 8 and 9. Also, there is an explanation diagram on the slide which I do not understand. Appreciate if you can take some time and explain.
The diagram on slide 9 represents the local memory in a call to fun
. Each row represents 4 bytes (listed from right to left), and memory addresses decrease as you go down. If you were to list addresses 0, 1, 2, ... in this format, it would look like this:
|...
+--+--+--+--+
|11|10| 9| 8|
+--+--+--+--+
| 7| 6| 5| 4|
+--+--+--+--+
| 3| 2| 1| 0|
+--+--+--+--+
The diagram on slide 9 shows how s
(a variable of type struct_t
) is laid out in memory. The system is using 4-byte int
s and 8-byte double
s. Thus s.a[0]
occupies 4 bytes (row 0 in the diagram), s.a[1]
another 4 (row 1), and s.d
8 bytes (rows 2 and 3).
The function accesses s.a[i]
. The compiler turns this into code that takes the starting address of s.a
and adds i*4
bytes to it to arrive at the selected element. In the diagram this corresponds to starting at row 0 and going i
rows up. This works fine as long as i
is actually a valid index in the array (in the example: either 0
or 1
, as a
only has 2 elements).
But if i
is bigger, then the code ends up accessing other parts of the memory. s.a[2]
(row 2 in the diagram) refers to memory that's part of s.d
, so overwriting it corrupts the value stored there (same for s.a[3]
). The exact resulting value depends on the internals of the floating-point format used (which is probably IEEE 754). (I'm not familiar with that, so I don't know how exactly those bits are interpreted to get 3.1399998664856
.)
s.a[4]
apparently wasn't important, as overwriting it didn't have any visible effect. But overwriting s.a[6]
crashed, indicating that we destroyed something vital. That was probably the return address, i.e. the saved location that tells fun
where to jump to when it's done. By overwriting it we made fun
jump to invalid memory.
To confirm this (and find out why it's index 6
specifically that breaks things), you'd have to look at the code generated by the compiler. There is no general answer because it depends on the compiler in question, optimization level, what system it runs on, etc.
However, it is very common that writing out of bounds to a local array in C will corrupt the return address at some point. This is because compilers almost universally implement function calls and local ("automatic") storage by using a stack, which therefore contains local variables and return addresses intermixed.