this is the assembly code i am supposed to translate: f1:
subl $97, %edi
xorl %eax, %eax
cmpb $25, %dil
setbe %al
ret
heres the c code I wrote that I think is equivalent.
int f1(int y){
int x = y-97;
int i = 0;
if(x<=25){
x = i;
}
return x;
}
and heres what I get from compiling the C code.
_f1: ## @f1
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
## kill: def %edi killed %edi def %rdi
leal -97(%rdi), %ecx
xorl %eax, %eax
cmpl $123, %edi
cmovgel %ecx, %eax
popq %rbp
retq
.cfi_endproc
I was wondering if this was correct / what should be different and if anyone could help explain how jmps work as I am also trying to translate this assembly code and have gotten stuck f2:
cmpl $1, %edi
jle .L6
movl $2, %edx
movl $1, %eax
jmp .L5
.L8:
movl %ecx, %edx
.L5:
imull %edx, %eax
leal 1(%rdx), %ecx
cmpl %eax, %edi
jg .L8
.L4:
cmpl %edi, %eax
sete %al
movzbl %al, %eax
ret
.L6:
movl $1, %eax
jmp .L4
gcc8.3 -O3 emits exactly the asm in the question for this way of writing the range check using the unsigned-compare trick.
int is_ascii_lowercase_v2(int y){
unsigned char x = y-'a';
return x <= (unsigned)('z'-'a');
}
Narrowing to 8-bit after the int
subtract matches the asm more exactly, but it's not necessary for correctness or even to convince compilers to use a 32-bit sub
. For unsigned char y
, the upper bytes of RDI are allowed to hold arbitrary garbage (x86-64 System V calling convention), but carry only propagates from low to high with sub and add.
The low 8 bits of the result (which is all the cmp
reads) would be the same with sub $'a', %dil
or sub $'a', %edi
.
Writing it as a normal range-check also gets gcc to emit identical code, because compilers know how optimize range-checks. (And gcc chooses to use 32-bit operand-size for the sub
, unlike clang which uses 8-bit.)
int is_ascii_lowercase_v3(char y){
return (y>='a' && y<='z');
}
On the Godbolt compiler explorer, this and _v2
compile as follows:
## gcc8.3 -O3
is_ascii_lowercase_v3: # and _v2 is identical
subl $97, %edi
xorl %eax, %eax
cmpb $25, %dil
setbe %al
ret
Returning a compare result as an integer, instead of using an if
, much more naturally matches the asm.
But even writing it "branchlessly" in C won't match the asm unless you enable optimization. The default code-gen from gcc/clang is -O0
: anti-optimize for consistent debugging, storing/reloading everything to memory between statements. (And function args on function entry.) You need optimization, because -O0 code-gen is (intentionally) mostly braindead, and nasty looking. See How to remove "noise" from GCC/clang assembly output?
## gcc8.3 -O0
is_ascii_lowercase_v2:
pushq %rbp
movq %rsp, %rbp
movl %edi, -20(%rbp)
movl -20(%rbp), %eax
subl $97, %eax
movb %al, -1(%rbp)
cmpb $25, -1(%rbp)
setbe %al
movzbl %al, %eax
popq %rbp
ret
gcc and clang with optimization enabled will do if-conversion to branchless code when it's efficient. e.g.
int is_ascii_lowercase_branchy(char y){
unsigned char x = y-'a';
if (x < 25U) {
return 1;
}
return 0;
}
still compiles to the same asm with GCC8.3 -O3
is_ascii_lowercase_branchy:
subl $97, %edi
xorl %eax, %eax
cmpb $25, %dil
setbe %al
ret
We can tell that the optimization level was at least gcc -O2
. At -O1
, gcc uses the less efficient setbe / movzx instead of xor-zeroing EAX ahead of setbe
is_ascii_lowercase_v2:
subl $97, %edi
cmpb $25, %dil
setbe %al
movzbl %al, %eax
ret
I could never get clang to reproduce exactly the same sequence of instructions. It likes to use add $-97, %edi
, and cmp with $26
/ setb
.
Or it will do really interesting (but sub-optimal) things like this:
# clang7.0 -O3
is_ascii_lowercase_v2:
addl $159, %edi # 256-97 = 8-bit version of -97
andl $254, %edi # 0xFE; I haven't figured out why it's clearing the low bit as well as the high bits
xorl %eax, %eax
cmpl $26, %edi
setb %al
retq
So this is something involving -(x-97)
, maybe using the 2's complement identity in there somewhere (-x = ~x + 1
).