assembly x86-64 stack-memory stack-frame

Trying to understand contents of stack frame x86-64

I’m working on practice problem 3.34 in Computer Systems a Programmers Perspective and trying to understand exactly what is happening. The question states "Consider a function P, which generates local values, named a0-a7. It then calls function Qusing these generated values as arguments. GCC produces the following code for the first part of P". We are given the following assembly code:

/* long P(long x)
 * x in %rdi */
P:
  pushq   %r15
  pushq   %r14
  pushq   %r13
  pushq   %r12
  pushq   %rbp
  pushq   %rbx
  subq    $24, %rsp
  leaq    1(%rdi), %r15
  leaq    2(%rdi), %r14
  leaq    3(%rdi), %r13
  leaq    4(%rdi), %r12
  leaq    5(%rdi), %rbp
  leaq    6(%rdi), %rax
  movq    %rax, (%rsp)
  leaq    7(%rdi), %rdx
  movq    %rdx, 8(%rsp)
  movl    $0, %eax
  call    Q

So far, this is what I understand: The instructions pushq %r15through pushq %rbx Are being pushed to the stack so as to preserve those values, and eventually replace them in their respective registers when procedure Preturns (Since they are callee saved registers). I see that the instruction subq $24, %rspallocates 24 bytes of space on the stack.

I have two questions though:

What are the load effective address lines doing? It seems to me that it is taking the memory location that is is addressed by long x and storing that new memory address (after adding 1 or 2 or ... 7) in the various callee saved registers. Is this correct? I'm a bit confused as to the value that they store? Is there any significance to it? Also, what will function Qdo with these registers? How does it know to use them as arguments, since they don't seem to be the argument registers? Only long xis passed on as an argument (as it is in register %rdi.
What is the contents of the Stack? I see that 24 bytes were allocated, but I can't seem to account for all of that space :( I understand the stack to look like this:

???????????????????????????????????:16
The result of 7(%rdi) (argument a7):8
The result of 6(%rdi) (argument a6):0 <--- %rsp

I cant seem to account for what is contained in bytes 16-23 :(

Thank you soo much in advance, I'm really struggling with this one.

Solution

First, note that there is an erratum for this practice problem. The local variables are not passed as arguments to Q; rather Q is being called with no arguments. So that explains why the variables aren't in the argument-passing registers.

(The strange zeroing of eax may be explained by Differences in the initialization of the EAX register when calling a function in C and C++ ; they might have accidentally declared void Q(); instead of void Q(void);. I'm not sure why the compiler emitted movl $0, %eax instead of the more efficient xorl %eax, %eax; it looks like optimizations are on, and that's a very basic optimization.)

Now as for lea, it's really just an arithmetic instruction, and compilers tend to use it that way. See What's the purpose of the LEA instruction?. So leaq 1(%rdi), %r15 simply adds 1 to the value in rdi and writes the result to r15. Whether the value in rdi represented an address or a number or something else is irrelevant to the machine. Since rdi contained the argument x, this is effectively doing

a0 = x + 1;
a1 = x + 2;
a2 = x + 3;
// ...

The alternative would be something like movq %rdi, %r15 ; addq $1, %r15 which is more instructions.

Of course, these values are being put in callee-saved registers (or memory, for the last two) so that they are not destroyed by the call to Q().

As for the stack, the x86-64 ABI requires 16-byte stack alignment. The stack pointer was a multiple of 16 before the call to P, and it must again be a multiple of 16 when we call Q. Our caller's call P pushed 8 bytes to the stack, and the various register pushes in the prologue push 48 bytes more. So in order to end up with a multiple of 16, we must adjust the stack pointer by 8 more than a multiple of 16 (i.e. an odd multiple of 8). We need 16 bytes for local variables, so we must adjust the stack pointer by 24. That leaves 8 bytes of stack that just won't be used for anything, which is your ?????? at 16(%rsp).