I have this loop that essentially adds two dynamic-width integers (in reality, it is unrolled a bit, but that does not matter here). Register RCX
contains the destination address, RDX
contains the source address, and R8
contains the length of the arrays.
clc # Clear CF flag before beginning
.Lloop0:
movq (%rdx), %rax # Load memory for next addition
adcq %rax, (%rcx) # Perform addition with carry (i.e. using CF flag)
leaq 8(%rcx), %rcx # Increment destination address (without affecting CF)
leaq 8(%rdx), %rdx # Increment source address (without affecting CF)
leaq -1(%r8), %r8 # Decrement length (without affecting CF)
testq %r8, %r8 # Test if length is zero (affects CF!)
jne Lloop0
The problem is that the TEST
instruction clears the CF
flag, which is needed by the next ADC
. The CMP
instruction would also have a similar effect.
I could of course copy the FLAGS
register using LAHF
before the test and restore it using SAHF
at the beginning of the loop, but I would like to avoid that if there is a way around it.
(Accepted answer)
You can swap the roles of %r8
and %rcx
above to yield:
clc
.p2align 4 # just a thought...
.Lloop0:
jrcxz .Lloop0_end
...
leaq -1(%rcx), %rcx
jmp .Lloop0
.Lloop0_end:
Update: jrcxz
inside a loop is never good, whereas dec / jnz
benefits from macro-fusion on recent micro-architectures. Perhaps I was overly pessimistic (10 years ago!) about the partial flags stall introduced by dec
- it is now an unequivocally better solution; i.e., replacing the last 3 lines with: dec %r8, jnz .Lloop0
andq %r8, %r8 # CF = 0
# jz .Lloop0_end
.p2align 4
.Lloop0:
movq (%rdx), %rax
leaq 8(%rdx), %rdx
adcq %rax, (%rcx)
leaq 8(%rcx), %rcx
decq %r8
jnz .Lloop0
.Lloop0_end: