Search code examples
loopsassemblyx86comparex86-64

In x86/x86_64, how can I "jump if not zero" without affecting the carry flag?


I have this loop that essentially adds two dynamic-width integers (in reality, it is unrolled a bit, but that does not matter here). Register RCX contains the destination address, RDX contains the source address, and R8 contains the length of the arrays.

    clc                    # Clear CF flag before beginning
.Lloop0:
    movq    (%rdx), %rax   # Load memory for next addition
    adcq    %rax, (%rcx)   # Perform addition with carry (i.e. using CF flag)
    leaq    8(%rcx), %rcx  # Increment destination address (without affecting CF)
    leaq    8(%rdx), %rdx  # Increment source address (without affecting CF)
    leaq    -1(%r8), %r8   # Decrement length (without affecting CF)
    testq   %r8, %r8       # Test if length is zero (affects CF!)
    jne     Lloop0

The problem is that the TEST instruction clears the CF flag, which is needed by the next ADC. The CMP instruction would also have a similar effect.

I could of course copy the FLAGS register using LAHF before the test and restore it using SAHF at the beginning of the loop, but I would like to avoid that if there is a way around it.


Solution

  • (Accepted answer)
    You can swap the roles of %r8 and %rcx above to yield:

        clc
        .p2align 4 # just a thought...
    .Lloop0:
        jrcxz .Lloop0_end
        ...
        leaq  -1(%rcx), %rcx
        jmp   .Lloop0
    .Lloop0_end:
    

    Update: jrcxz inside a loop is never good, whereas dec / jnz benefits from macro-fusion on recent micro-architectures. Perhaps I was overly pessimistic (10 years ago!) about the partial flags stall introduced by dec - it is now an unequivocally better solution; i.e., replacing the last 3 lines with: dec %r8, jnz .Lloop0

        andq    %r8,            %r8     # CF = 0
    #   jz      .Lloop0_end
        .p2align 4
    .Lloop0:
        movq    (%rdx),         %rax
        leaq    8(%rdx),        %rdx
        adcq    %rax,           (%rcx)
        leaq    8(%rcx),        %rcx
        decq    %r8
        jnz     .Lloop0
    .Lloop0_end: