I'm analysing a piece of inefficient code, but some of it is so confusing?
Original code:
#include <string.h>
void lowwer(char *str) {
for (int i = 0; i < strlen(str); ++i) {
str[i] -= ('A' - 'a');
}
}
Assembly code (generated by clang 13 with -Og option):
lowwer:
pushq %r14 # use saved-registers
pushq %rbx
pushq %rax
# guard do while
cmpb $0, (%rdi) # compare &str with null (check if strlen(str) == 0)
je .LBB0_3
# loop initialization
movq %rdi, %r14 # %r14 = str
xorl %ebx, %ebx # clear %rbx (for more compact encoding)
.LBB0_2: # =>This Inner Loop Header: Depth=1
addb $32, (%r14,%rbx) # subtract -32 from str[i] ('A' - 'a' = -32)
addq $1, %rbx # ++i
movq %r14, %rdi # seems meaningless here?
callq strlen@PLT
cmpq %rbx, %rax # check i < strlen(str)
ja .LBB0_2
.LBB0_3: # end
addq $8, %rsp # ???
popq %rbx # free registers
popq %r14
retq
movq %r14, %rdi
is doing? It seemed meangingless because %r14
holding the string pointer and the rdi
is the same.addq $8, %rsp
. Looks clueless.rdi
is a caller-saved register and is hence trashed by the call to strlen
. To preserve its contents, the compiler emitted code to move its contents to r14
, copying it back once every iteration as an argument to strlen
.
The addq $8, %rsp
instruction releases stack space previously allocated by pushq %rax
. This stack space was allocated to satisfy the stack alignment requirements imposed by the amd64 SysV ABI.
Refer to the amd64 SysV ABI supplement for the full calling convention and a list of caller/callee saved registers.