Why padding in C is valid for variables/structs allocated on stack?

I'm reading about structure padding in C here: http://www.catb.org/esr/structure-packing/.
I don't understand why padding determined during compile-time for variables/structures allocated on stack is valid semantically in all cases. Let me provide an example. Say we have this toy code to be compiled:

int main() {
    int a;
    a = 1;
}

On X86-64 gcc -S -O0 a.c generates this assembly (unnecessary symbols removed):

main:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $1, -4(%rbp)
    movl    $0, %eax
    popq    %rbp
    ret

In this case why do we know that value of %rbp and consequently %rbp-4 is 4-aligned to be suitable for storing/loading int?

Let's try the same example with structs.

struct st{
    char a;
    int b;
}

From the reading I infer that padded version of structure looks something like this:

struct st{
    char a;      // 1 byte
    char pad[3]; // 3 bytes
    int b;       // 4 bytes
}

So, second toy example

int main() {
    struct st s;
    s.a = 1;
    s.b = 2;
}

generates

main:
    pushq   %rbp
    movq    %rsp, %rbp
    movb    $1, -8(%rbp)
    movl    $2, -4(%rbp)
    movl    $0, %eax
    popq    %rbp
    ret

And we observe that this really is the case. But again, what is the guarantee that value of rbp itself on arbitrary stack frame is properly aligned? Isn't the value of rbp available only in run time? How can compiler align members of struct if nothing is known about alignment of struct's start address at compile time?

Solution

As @P__J__ points out (in a now deleted answer) - how a C compiler generates code is an implementation detail. Since you tagged this as an ABI question, your real question is "When GCC is targeting Linux, how is it allowed to assume that RSP has any particular minimum alignment?". The 64-bit ABI that Linux uses is the AMD64(x86-64) System V ABI. The minimum alignment of the stack just before CALLing an ABI compliant^1,2 function (including main) is guaranteed to be a minimum of 16 bytes (it can be 32 byte or 64 bytes depending on the types passed to the function). The ABI states:

3.2.2 The Stack Frame

In addition to registers, each function has a frame on the run-time stack. This stack grows downwards from high addresses. Figure 3.3 shows the stack organization. The end of the input argument area shall be aligned on a 16 (32 or 64, if __m256 or __m512 is passed on stack) byte boundary. In other words, the value (%rsp + 8) is always a multiple of 16 (32 or 64) when control is transferred to the function entry point. The stack pointer, %rsp, always points to the end of the latest allocated stack frame.

You may ask why the mention RSP+8 being a multiple of 16 (and not RSP+0). This is because the concept of CALLing a function implies that an 8 byte return address will be placed on the stack by the CALL instruction itself. Whether a function is called or jumped to (ie: tail call), the code generator always assumes that just prior to executing the first instruction in a function the stack is always misaligned by 8. There is an automatic guarantee though that the stack will be aligned on an 8 byte boundary. If you subtract 8 from RSP you are guaranteed to be 16 byte aligned once again.

It is noteworthy to observe that the code below guarantees that after the PUSHQ the stack is aligned on a 16 byte boundary since the PUSH instruction decreases RSP by 8 and aligns the stack to a 16 byte boundary once again:

main:
                             # <------ Stack pointer (RSP) misaligned by 8 bytes
    pushq   %rbp
                             # <------ Stack pointer (RSP) aligned to 16 byte boundary
    movq    %rsp, %rbp
    movb    $1, -8(%rbp)
    movl    $2, -4(%rbp)
    movl    $0, %eax
    popq    %rbp
    ret

For 64-bit code, the conclusion one can draw from all this is that although the actual value of the stack pointer is known at run-time, the ABI allows us to infer that the value upon entry to a function has a particular alignment and the compilers code generation system can use that to its advantage when placing a struct on the stack.

When a function's stack alignment isn't enough for a variable's alignment?

A logical question is - if the stack alignment that can be guaranteed upon entry to a function is not enough for the alignment of a struct or data type placed on the stack, what does the GCC compiler do? Consider this revision to your program:

struct st{
    char a;      // 1 byte
    char pad[3]; // 3 bytes
    int b;       // 4 bytes
};

int main() {
    struct st s __attribute__(( aligned(32)));
    s.a = 1;
    s.b = 2;
}

We've told GCC that the variable s should be 32 byte aligned. A function that can guarantee 16 byte stack alignment doesn't guarantee 32 byte alignment (32 byte alignment does guarantee 16 byte alignment since 32 is evenly divisible by 16). The GCC compiler will have to generate function prologue so that s can be properly aligned. You can look at the unoptimized output of godbolt for this program to see how GCC achieves this:

main:
        pushq   %rbp
        movq    %rsp, %rbp
        andq    $-32, %rsp    # ANDing RSP with -32 (0xFFFFFFFFFFFFFFE0) 
                              # rounds RSP down to next 32 byte boundary
                              # by zeroing the lower 5 bits of RSP.
        movb    $1, -32(%rsp) 
        movl    $2, -28(%rsp)
        movl    $0, %eax
        leave
        ret

Footnotes

¹The AMD64 System V ABI is also used by 64-bit Solaris, MacOS, and BSD as well as Linux
²The 64-bit Microsoft Windows calling convention (ABI) guarantees that prior to a function call that the stack is 16-byte aligned (8 byte misaligned just prior to the first instruction of the function being executed).