I was trying to understand structure padding. This is what I understood so far
Suppose I have the following struct
struct C
{
int a; //4 byte
double d; //8 byte
int b; //4 byte
};
Now from my understanding, the compiler realizes that the highest byte size is 8 bytes because of the double. So the processor splits this into segments of 8 bytes. So we get the total size of this struct as (size of a) 4 + 4 padding + 8 (size of d) + 4 + 4 padding = 24 bytes.
So this means this struct will be 24 bytes. Now on a 64-bit machine, the processor reads 8 (word) bytes at a time. Does this mean the processor will read the struct in 3 tries? Is my understanding correct? Basically, what I am trying to understand is how padding relates to CPU performance.
Where you write "highest byte size is 8 bytes", I assume you meant "highest alignment is 8 bytes".
Such an alignment requirement (or choice*) would indeed cause the structure to be 24 bytes. This does not mean that the "processor reads the structure in 3 tries". That is not how processors work. Modern processors do not know about structures and layouts. They read individual values from cache into registers. For an instruction like myC.d += 1.0;
they only need to read the old d
value, increment that, and write just that back to cache.
Similarly, the cache in a modern CPU doesn't know about structures. It works on cache lines, which have a size that's some small power of two. Since that won't divide 24, it's quite possible that a C
instance is split over two cache lines. That would not matter much when you only use d
, because d
is only 8 bytes and won't be split over cache lines.
The latter does mean that it's a bit unpredictable whether C
members share a cache line. For an array of C
objects, the answer has to be "sometimes they do, sometimes they don't".