How does defining an explicit destructor for a C++ struct affect calling conventions?

A coworker poked me with this question, after noticing a curious behavior with C++ structs.

Take this trivial code:

struct S {
  int i;
#ifdef TEST
  ~S() {}
#endif
};

void foo (S s) {
  (void)s;
}

int main () {
  foo(S());
  return 0;
}

I have generated the assembly code once without the explicit destructor:

g++-4.7.2 destructor.cc -S -O0 -o destructor_no.s

and later including it:

g++-4.7.2 destructor.cc -DTEST -S -O0 -o destructor_yes.s

This is the code[1] for main in destructor_no.s:

main:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $0, %eax
    movl    %eax, %edi
    call    _Z3foo1S   // call to foo()
    movl    $0, %eax
    popq    %rbp
    ret

While, instead, if the destructor is defined explicitly:

main:
    pushq   %rbp
    movq    %rsp, %rbp
    subq    $16, %rsp
    movl    $0, -16(%rbp)
    leaq    -16(%rbp), %rax
    movq    %rax, %rdi
    call    _Z3foo1S   // call to foo()
    leaq    -16(%rbp), %rax
    movq    %rax, %rdi
    call    _ZN1SD1Ev  // call to S::~S()
    movl    $0, %eax
    leave
    ret

Now, my assembly knowledge is a bit rusty, but it seems to me that:

in the first case, the struct is passed "by value". I.e., its memory content is copied into the %edi register, that, if I am not mistaken, is the first register used for argument passing in the x86-64 ABI.
in the second case, instead, the struct is allocated on the stack, but the foo() function is called with a pointer in %rdi.

Why is there such a difference?

Notes:

The same behavior is confirmed if using gcc-4.6.3, or clang 3.1.
Of course, if optimizations are enabled, the call to function foo() is completely optimized away in any case.
An interesting pattern emerges when adding more variables to the struct, if no destructor is explicly provided.

Up to 4 ints (= 16 bytes) are passed through the argument registers:

pushq   %rbp
movq    %rsp, %rbp
subq    $16, %rsp
movl    $0, -16(%rbp)
movl    $0, -12(%rbp)
movl    $0, -8(%rbp)
movl    $0, -4(%rbp)
movq    -16(%rbp), %rdx
movq    -8(%rbp), %rax
movq    %rdx, %rdi
movq    %rax, %rsi
call    _Z3foo1S

but as soon as I add a fifth int to the struct, the argument to the function, still passed "by value", is now on the stack:

pushq   %rbp
movq    %rsp, %rbp
subq    $56, %rsp
movl    $0, -32(%rbp)
movl    $0, -28(%rbp)
movl    $0, -24(%rbp)
movl    $0, -20(%rbp)
movl    $0, -16(%rbp)
movq    -32(%rbp), %rax
movq    %rax, (%rsp)
movq    -24(%rbp), %rax
movq    %rax, 8(%rsp)
movl    -16(%rbp), %eax
movl    %eax, 16(%rsp)
call    _Z3foo1S

[1] I have removed some lines that I think are unnecessary for the purpose of this question.

Solution

In C++03-speak if you define a destructor your struct is not a POD-type anymore. An object of the variant without the destructor behaves like a C struct variable (thus it's just passed around by value), while the one with the user-defined one behaves like a C++ object.