Search code examples
ccompiler-constructionlocalmemory-alignmentruntime-environment

Data layouts used by C compilers (the alignment concept)


Below is an excerpt from the red dragon book.

Example 7.3. Figure 7.9 is a simplification of the data layout used by C compilers for two machines that we call Machine 1 and Machine 2.

Machine 1 : The memory of Machine 1 is organized into bytes consisting of 8 bits each. Even though every byte has an address, the instruction set favors short integers being positioned at bytes whose addresses are even, and integers being positioned at addresses that are divisible by 4. The compiler places short integers at even addresses, even if it has to skip a byte as padding in the process. Thus, four bytes, consisting of 32 bits, may be allocated for a character followed by a short integer.

Machine 2: each word consists of 64 bits, and 24 bits are allowed for the address of a word. There are 64 possibilities for the individual bits inside a word, so 6 additional bits are needed to distinguish between them. By design, a pointer to a character on Machine 2 takes 30 bits — 24 to find the word and 6 for the position of the character inside the word. The strong word orientation of the instruction set of Machine 2 has led the compiler to allocate a complete word at a time, even when fewer bits would suffice to represent all possible values of that type; e.g., only 8 bits are needed to represent a character. Hence, under alignment, Fig. 7.9 shows 64 bits for each type. Within each word, the bits for each basic type are in specified positions. Two words consisting of 128 bits would be allocated for a character followed by a short integer, with the character using only 8 of the bits in the first word and the short integer using only 24 of the bits in the second word. □

The figure

I found about the concept of alignment here, here and here. What I could understand from them is as follows:

In word addressable CPUs (where size is more than a byte), there certain paddings are introduced in the data objects, such that the CPU can efficiently retrieve data from the memory with the minimum number of memory cycles.

Now the Machine 1 here is actually a byte address one. And the conditions in the Machine 1 specification are probably more difficult than a simple word addressable machine having word size of, say, 4 bytes. In such a 64 bit machine, we need to make sure that our data items are just word aligned, no more difficulty. But how to find the alignment in systems like Machine 1 (as given in the table above) where the simple concept of word alignment does not work, because it is byte addressable and has much more difficult specifications.

Moreover, I find it quite weird that in the row for double, the size of the type is more than what is given in the alignment field. Shouldn't alignment(in bits) ≥ size (in bits)? Because alignment refers to the memory actually allocated for the data object(?).

"Each word consists of 64 bits, and 24 bits are allowed for the address of a word. There are 64 possibilities for the individual bits inside a word, so 6 additional bits are needed to distinguish between them. By design, a pointer to a character on Machine 2 takes 30 bits — 24 to find the word and 6 for the position of the character inside the word." -

Moreover, how should this statement about the concept of the pointers, based on alignment, be visualized (2^6 = 64, it is fine, but how are these 6 bits correlating with the alignment concept)?


Solution

  • First of all, the machine 1 is not special at all. It is exactly like an x86-32 or 32-bit ARM.

    Moreover I find it quite weird that in the row for double the size of the type is more than what is given in the alignment field. Shouldn't alignment(in bits) ≥ size (in bits) ? Because alignment refers to the memory actually allocated for the data object (?).

    No, this isn't true. Alignment means that the address of the lowest addressable byte in the object must be divisible by the given number of bytes.

    Additionally, with C, it is also true that within arrays sizeof (ElementType) will need to be greater than or equal to the alignment of each member and sizeof (ElementType) be divisible by alignment, thus the footnote a. Therefore on the latter computer:

     struct { char a, b; }
    

    might have sizeof 16 because the characters are in distinct addressable words, whereas

    struct { char a[2];  }
    

    could be squeezed into 8 bytes.


    how should this statement about the concept of the pointers, based on alignment is to be visualized (2^6 = 64, it is fine but how is this 6 bits correlating with the alignment concept)

    As for the character pointers, the 6 bits is bogus. 3 bits are needed to choose one of the 8 bytes within the 8-byte words, so this is an error in the book. An ordinary byte would select just a word with 24 bits, and a character (a byte) pointer would select the word with 24 bits, and one of the 8-bit bytes inside the word with 3 bits.