Search code examples
carchitecturemallocmemory-alignment

What is the alignment of data on the stack?


I read K&R C(2nd) 185p and one part was hard to understand.

Although machines vary, for each machine there is a most restrictive type: if the most restrictive type can be stored at a particular address, all other types may be also. On some machines, the most restrictive type is a double; on others, int or long suffices.

I think
most modern computers are byte-addressable(by wiki). The smallest data type char is enough to fit in any stack area. So all data types are adequate to arbitrary stack position. But why does such restriction exist?

In this similar question,

CPUs often require that (or work more efficiently if) certain types of data are stored at addresses that are a multiple of some (power-of-two) value.

This explains about my question. But I couldn't understand it. Does that mean certain addresses of a power of two(2, 4, 8, 16, ..., 1024, 2048, ...) in the stack require certain types?
If so, why? Or if I'm wrong, What does it refer to?


Solution

  • There are two reasons to align data:

    • Hardware requirement. Some machine can only access data in memory if it's properly aligned. Sure, you could perform multiple reads and use some bit arithmetic to emulate reading from any address, but that would be devastating to performance.
    • Performance. Even if a machine can access any data at any address, it might perform better if the data is suitable aligned.

    Of course, this could vary by machine, but "suitably aligned" usually means the address of an N bit datum is evenly divisible by N/8.

    So, on a machine where alignment matters, a 32-bit int would be placed at a memory address divisible by 4, a 64-bit pointer would be placed at a memory address divisible by 8, etc.

    You can see this in structures.

    #include <stdint.h>
    #include <stdio.h>
    
    typedef struct {
       uint32_t u32;
       void*    p;
       uint8_t  u8;
    } Struct;
    
    int main(void) {
       Struct s;
       printf("%p\n", (void*)&s.u32);
       printf("%p\n", (void*)&s.p);
       printf("%p\n", (void*)&s.u8);
       printf("%p\n", (void*)(&s+1));
       printf("0x%zx\n", sizeof(s));
    }
    
    $ gcc -Wall -Wextra -pedantic a.c -o a && ./a
    0x7ffef5f775d0
    0x7ffef5f775d8
    0x7ffef5f775e0
    0x7ffef5f775e8
    0x18
    

    This means we have this:

     0 1 2 3 4 5 6 7 8 9 a b c d e f 0 1 2 3 4 5 6 7
    +-------+-------+---------------+-+-------------+ 
    | u32   |XXXXXXX| p             |*|XXXXXXXXXXXXX|   * = u8 
    +-------+-------+---------------+-+-------------+   X = unused
    

    Note the wasted space between u32 and p. This is so p is properly aligned.

    Also note the wasted space after u8. This is so the structure itself is properly aligned when you have an array of them. Without this final padding, the u32 and p of the second element of the array wouldn't be properly aligned.

    Finally, note that using

    typedef struct {
       uint32_t u32;
       uint8_t  u8;
       void*    p;
    } Struct;
    

    would have resulted in a smaller structure.

     0 1 2 3 4 5 6 7 8 9 a b c d e f 
    +-------+-+-----+---------------+
    | u32   |*|XXXXX| p             |   * = u8 
    +-------+-+-----+---------------+   X = unused