I am thinking about how to compile arrays in x86-64 assembly.
I am reading "Computer Systems - A Programmer's Perspective" and the authors give the following formula:
E[i] -> `movl (%rdx, %rcx, 4i), %eax`
But I just used Compiler Explorer and GCC 12.2 to see the generated assembly for the following program:
int main() {
int arr[] = {3,4,5};
for (int i = 0; i < 3; i++) {
arr[i];
}
}
The generated code for this is:
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-16], 3
mov DWORD PTR [rbp-12], 4
mov DWORD PTR [rbp-8], 5
mov DWORD PTR [rbp-4], 0
jmp .L2
.L3:
add DWORD PTR [rbp-4], 1
.L2:
cmp DWORD PTR [rbp-4], 2
jle .L3
mov eax, 0
pop rbp
ret
Here is a quote from a stackoverflow answer that explains what the registers RBP and RSP are doing:
rbp is the frame pointer on x86_64. In your generated code, it gets a snapshot of the stack pointer (rsp) so that when adjustments are made to rsp (i.e. reserving space for local variables or pushing values on to the stack), local variables and function parameters are still accessible from a constant offset from rbp.
Is RBP 16 here? For example rbp - 16 = 0
, rbp - 12 = 4
, rbp - 8 = 8
which kind of satisfies the formula.
Anyway, could you please give me an idea of how the formula from the textbook is related to the generated code from GCC?
arr[i];
as a C statement compiles to zero instructions, even in a debug build, since arr
isn't volatile
. All the asm you're seeing is just locals at fixed locations.
If you're used to looking at AT&T syntax, use the output dropdown on Godbolt to uncheck "Intel Syntax".
Also, you could get useful asm to look at by writing a function that takes a pointer arg and sums it for example, so you can enable light optimization without having the array optimize away to just returning a constant. You could use volatile
to force it not to if you really want to see asm that stores array initializers to the stack instead of just taking a function arg. You're writing this to look at the asm, not run it, so don't write a main()
. (In general see How to remove "noise" from GCC/clang assembly output?)
int sumarr(int *arr){
register int sum = 0; // register keyword is useful in GCC if you're going to disable optimization
for (int i=0 ; i<1024 ; i++){
sum += arr[i];
}
return sum;
}
Godbolt - gcc 12 with -Og
uses an indexed addressing mode like you expect; most other optimization levels do a pointer increment.
# gcc -Og
sumarr:
movl $0, %eax
movl $0, %edx
jmp .L2 # jump to the loop condition, redundant because 0 <= 1023 is known true; could just fall through into the loop, but -Og maybe intentionally preserves that execution? It doesn't in general give consistent debugging.
.L3: # do {
movslq %eax, %rcx # sign-extend int to intptr_t since -Og doesn't optimize much
addl (%rdi,%rcx,4), %edx # edx += *(rdi + rcx*4) = array[i]
addl $1, %eax
.L2: # loop entry point for first iteration
cmpl $1023, %eax
jle .L3 # }while(i<=1023)
movl %edx, %eax # at low optimization levels, GCC didn't sum into the return-value register in the first place.
ret
With a normal level of optimization like -O2
, we get a pointer increment:
# GCC -O2 -fno-tree-vectorize (-O2 by itself would vectorize with SIMD)
sumarr:
leaq 4096(%rdi), %rdx # endp = ptr + len
xorl %eax, %eax # sum = 0
.L2: # do {
addl (%rdi), %eax
addq $4, %rdi
cmpq %rdx, %rdi
jne .L2 # }while(p != endp)
ret
The amount of memory reserved for the local variables is always a multiple of 16 bytes, to keep the stack aligned to 16 bytes. in reality it should be 0 but it needs to be a multiple of 16?
That's an over-simplification, but yes in this case 0*16 = 0
is a multiple of 16 which maintains stack alignment for RSP after push rbp
. The actual space it uses is in the red zone, 128 bytes below RSP.
The amount of stack space reserved for locals is always a multiple of 8, either odd or even depending on how many pushes it did, so the total movement of RSP is 16*n + 8
.
In this case where it doesn't need to preserve any call-preserved registers like RBX, it only pushed RBP (because -fno-omit-frame-pointer
is the default at -O0
.)
It doesn't sub rsp, 16
because the x86-64 SysV ABI includes a red-zone; the 128 bytes below RSP are safe from signal handlers or anything stepping on them asynchronously, so can be used without "reserving". Use -mno-red-zone
to stop GCC doing that, if you want to see it reserve space for an array.