Search code examples
c++cpaddingcompiler-optimizationstructlayout

Trailing padding in C/C++ in nested structures - is it neccesary?


This is more of a theoretical question. I'm familiar with how padding and trailing padding works.

struct myStruct{
    uint32_t x;
    char*    p;
    char     c;
};

// myStruct layout will compile to
// x:       4 Bytes
// padding: 4 Bytes
// *p:      8 Bytes
// c:       1 Byte
// padding: 7 Bytes
// Total:   24 Bytes

There needs to be padding after x, so that *p is aligned, and there needs to be trailing padding after c so that the whole struct size is divisible by 8 (in order to get the right stride length). But consider this example:

struct A{
    uint64_t x;
    uint8_t  y;
};

struct B{
    struct A myStruct;
    uint32_t c;
};

// Based on all information I read on internet, and based on my tinkering
// with both GCC and Clang, the layout of struct B will look like:
// myStruct.x:       8 Bytes
// myStruct.y:       1 Byte
// myStruct.padding: 7 Bytes
// c:                4 Bytes
// padding:          4 Bytes
// total size:       24 Bytes
// total padding:    11 Bytes
// padding overhead: 45%

// my question is, why struct A does not get "inlined" into struct B,
// and therefore why the final layout of struct B does not look like this:
// myStruct.x:       8 Bytes
// myStruct.y:       1 Byte
// padding           3 Bytes
// c:                4 Bytes
// total size:       16 Bytes
// total padding:    3 Bytes
// padding overhead: 19%

Both layouts satisfy alignments of all variables. Both layouts have the same order of variables. In both layouts struct B has correct stride length (divisible by 8 Bytes). Only thing that differs (besides 33% smaller size), is that struct A does not have correct stride length in layout 2, but that should not matter, since clearly there is no array of struct As.

I checked this layout in GCC with -O3 and -g, struct B has 24 Bytes.

My question is - is there some reason why this optimization is not applied? Is there some layout requirement in C/C++ that forbids this? Or is there some compilation flag I'm missing? Or is this an ABI thing?

EDIT: Answered.

  1. See answer from @dbush on why compiler cannot emit this layout on it's own.
  2. The following code example uses GCC pragmas packed and aligned (as suggested by @jaskij) to manualy enforce the more optimized layout. Struct B_packed has only 16 Bytes instead of 24 Bytes (note that this code might cause issues/run slow when there is an array of structs B_packed, be aware and don't blindly copy this code):
struct __attribute__ ((__packed__)) A_packed{
    uint64_t x;
    uint8_t  y;
};

struct __attribute__ ((__packed__)) B_packed{
    struct A_packed myStruct;
    uint32_t c __attribute__ ((aligned(4)));
};

// Layout of B_packed will be
// myStruct.x:       8 Bytes
// myStruct.y:       1 Byte
// padding for c:    3 Bytes
// c:                4 Bytes
// total size:       16 Bytes
// total padding:    3 Bytes
// padding overhead: 19%

Solution

  • is there some reason why this optimization is not applied

    If this were allowed, the value of sizeof(struct B) would be ambiguous.

    Suppose you did this:

    struct B b;
    struct A a = { 1, 2 };
    b.c = 0x12345678;
    memcpy(&b.myStruct, &a, sizeof(struct A));
    

    You'd be overwriting the value of b.c.