Search code examples
c++cmemorystructmemory-alignment

Why alignment of struct is not equal to its size?


I have read some questions about memory alignment, but I haven't found any answer related to my question.

Here I have 2 questions:

struct MyData
{
    short Data1;
    short Data2;
    short Data3;
};
  1. Why is the alignment of MyData equal to 2 instead of 6 == sizeof(MyData)? Are there any specific reasons that struct has a different method to calculate alignment than other primitive types?

  2. According to Microsoft's Alignment documentation:

    An address is said to be aligned to X if its alignment is Xn+0

    Does that mean an object of MyData is aligned only if its address is a multiply of its alignment, which is 2 in this case? If so, the bigger alignment of the struct is the less aligned objects of MyData we can allocate in memory? Are my assumptions correct?


Solution

  • Alignment requirements arise out of the way computer hardware works, particularly the memory bus.

    A typical modern processor does not load one byte at a time from memory. It is connected to memory by a bus, which is essentially a set of wires that carries data from one place to another. With 64 wires, 64 bits (eight bytes) can be transferred at one time. For illustration, this answer will use a system with a 64-bit bus. This is a simplified abstract view; a physical bus may have additional wires used for control and may have complicated protocols about transferring data. Also, memory cache is not considered, for simplicity.

    Since the bus can transfer eight bytes at one time, memory is organized into groups of eight bytes. Each group is given a number, so there is group 0, group 1, group 2, and so on. When the processor wants to load data from memory, it sends a group number over the bus to memory, and memory responds by sending the contents of that group.

    In the view of a program running on the computer, every byte in memory has its own address: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,… When a program requests the byte at address 12,345, the processor does not have any physical method by which it can request only that byte from memory. Instead, it calculates the group number. With bytes organized into groups of 8, byte 12,345 is in group floor(12,345 / 8) = 1543, and byte 12,345 is byte 1 of that group. So the processor asks memory to send the contents of group 1543. When the processor gets those eight bytes, it takes byte number 1 from the group and gives it to your program.

    Next, suppose you have a 16-bit short int. If the compiler assigns it start location 12,344, so it is in bytes 12,344 and 12,345, then, when the processor needs to load it, it loads group 1543 and takes bytes 0 and 1 from that group. This works okay. On the other hand, if the compiler assigned start location 12,343, the first byte of your short int would be byte 7 in group 1542, and the second byte would be byte 0 in group 1543. To load your short int, the processor would have to ask memory to send group 1542, and it would also have to ask memory to send group 1543, and then it would have to take byte 7 from the first group and byte 0 from the second group. Now what is one load instruction in your program requires two memory operations.

    Some processors cannot handle that; they are not designed to split up single load instructions into multiple memory operations. If they get a request for such a load, they trigger a program exception instead. Some processors are designed so that such a load is not possible; the bits used for the load address do not even contain bits with position values 1, 2, or 4 (depending on the width of the object being loaded). Only the high bits are used. Those bits contain the group number, and the low bits are assumed to be zero. Other processors are designed so they can split up load instructions into multiple memory operations, but this is slow. In all of these cases, we want C programs to align their objects so this problem does not occur.

    Making a rule that each object must be aligned according to its alignment requirement solves this problem. If two-byte short int must always start at an address that is a multiple of two bytes, then it can never be split across two memory groups. Memory group 1543 can hold four short int, one at bytes 12,344 and 12,345, another at 12,346 and 12,347, another at 12,348 and 12,349, and the last at 12,350, and 12,351.

    Observe that you could put a short int at 12,345 and 12,346, and it would fit inside one memory group. However, then we cannot make a long array of these short int, because the next one would have to be at 12,347 and 12,348, then the next at 12,349 and 12,350, and the next at 12,351 and 12,352—and that last one is split across two memory groups, 1543 and 1544. So, even though a single short int could be put at certain odd addresses, we make a rule that every short int must start at an address that is a multiple of two bytes.

    Similarly, we require that each four-byte int must start at a multiple of four bytes, because this guarantees each four-byte int is in one memory group and that this is true for an array of them.

    Suppose we had a 16-byte primitive object, maybe an int128_t. In this system, we would only need eight-byte alignment for it. To see why, consider that this object always requires at least two memory groups because it is bigger than one memory group. So, even without considering alignment, it requires two memory groups. If it starts at a memory group (eight bytes), then it occupies exactly two memory groups. If it starts anywhere else, then it takes a few bytes in one memory group, then all eight bytes of another group, then a few bytes in a third group. So, if this object does not have eight-byte alignment, it takes more memory groups than it ideally needs. So, for a 16-byte object, requiring it to have eight-byte alignment is necessary for efficiency and is sufficient. Requiring it to have stricter alignment, a multiple of 16 bytes, would not reduce the number of memory groups required to load it.

    Now consider your structure. It contains three short int. Requiring two-byte alignment for your structure is sufficient to guarantee that each member of this structure also has two-byte alignment. That means the two-byte alignment requirement is sufficient to guarantee this structure works on the computer hardware—the processor will never be asked to load a short int member that straddles two memory groups.

    Loading individual members is not the only thing we do with structures. Sometimes we copy one structure to another, so we load the entire structure and store it somewhere else. If the structure has only two-byte alignment, it is possible the compiler assigns it a starting location of 12,350, so the first member is bytes 7 and 8 in group 1543, and the second member is bytes 0 and 1 in group 1544, and the third member is bytes 2 and 3 in that group. So loading the structure will require loading two memory groups. If we required the structure to have eight-byte alignment, the compiler could not start it at 12,350; it would have to start at the start of a memory group, and it would always be possible to load the entire structure with a single memory operation.

    This is a reasonable thing to do in some circumstances, and you can use C’s _Alignas feature to request it. However, it uses more space, since then this six-byte structure requires two bytes of padding. It has not been found to be useful in general to automatically increase structure alignment requirements this way, so it is not generally done. (The C standard allows a C implementation to do this if its implementors choose.)

    Alignment requirements in ordinary computer hardware are powers of two, because of the nature of address calculations: We use bits to make binary numerals to represent addresses, so organizing memory into groups of eight is convenient. To calculate the group number, we divide by eight, which is easy since it involves just using the high bits of an address. If we used groups of six bytes, we would have to divide by six, which requires arithmetic operations; it cannot be done by simple bit shifts. It is theoretically possible to design computer hardware that would use groups of six bytes for memory organization and would have six bytes as an alignment requirement for many objects, but the hardware would be less efficient than our current hardware due to the extra work required to support memory groups of six bytes.