Search code examples
c++performance

Understanding the relationship between C++ padding and processor


I was trying to understand structure padding. This is what I understood so far

Suppose I have the following struct

struct C
{
 int a;    //4 byte
 double d; //8 byte
 int b;    //4 byte
};

Now from my understanding, the compiler realizes that the highest byte size is 8 bytes because of the double. So the processor splits this into segments of 8 bytes. So we get the total size of this struct as (size of a) 4 + 4 padding + 8 (size of d) + 4 + 4 padding = 24 bytes.

So this means this struct will be 24 bytes. Now on a 64-bit machine, the processor reads 8 (word) bytes at a time. Does this mean the processor will read the struct in 3 tries? Is my understanding correct? Basically, what I am trying to understand is how padding relates to CPU performance.


Solution

  • Where you write "highest byte size is 8 bytes", I assume you meant "highest alignment is 8 bytes".

    Such an alignment requirement (or choice*) would indeed cause the structure to be 24 bytes. This does not mean that the "processor reads the structure in 3 tries". That is not how processors work. Modern processors do not know about structures and layouts. They read individual values from cache into registers. For an instruction like myC.d += 1.0; they only need to read the old d value, increment that, and write just that back to cache.

    Similarly, the cache in a modern CPU doesn't know about structures. It works on cache lines, which have a size that's some small power of two. Since that won't divide 24, it's quite possible that a C instance is split over two cache lines. That would not matter much when you only use d, because d is only 8 bytes and won't be split over cache lines.

    The latter does mean that it's a bit unpredictable whether C members share a cache line. For an array of C objects, the answer has to be "sometimes they do, sometimes they don't".