Since −x = not(x)+1 which then implies a-b = a+not(b)+1, would then
sub rax, rcx
be equivalent to
mov temp, rcx
not temp
add rax, temp
add rax, 1
where temp is some register considered to be volatile?
In other words, does the latter affect EFLAGS in the exact same way? If not, how can it be forced to?
Yes, that gets the same integer result in RAX.
In other words, does the latter affect EFLAGS in the exact same way?
No. ZF, SF, and PF only depend on the integer result, but CF and OF1 depend on how you get there. x86's CF carry flag is a borrow output from subtraction. (Unlike some ISAs such as ARM, where subtraction sets the carry flag if there was no borrow.)
Trivial counterexample you could check in your head:
0 - 1
with sub
sets CF=1. But your way clears CF.
mov temp, rcx # no effect on FLAGS
not temp # no effect on FLAGS, unlike most other x86 ALU instructions
add rax, ~1 = 0xFF..FE # 0 + anything clears CF
add rax, 1 # 0xFE + 1 = 0xFF..FF = -1. clears CF
(Fun fact: not
doesn't affect FLAGS, unlike most other ALU instructions including neg
. neg
sets flags the same as sub
from 0
. A strange quirk of x86 history. https://www.felixcloutier.com/x86/not#flags-affected)
Footnote 1: AF, the half-carry flag (auxiliary) from the low to high nibble in the low byte, also depends on how you get there. You can't branch on it directly, and x86-64 removed the BCD instructions like aaa
that read it, but it's still there in RFLAGS where you can read it with pushf
/ pop rax
for example.
If not, how can it be forced to?
Use different instructions. The easiest and most efficient way to get the desired effect on EFLAGS would be to optimize it back to sub rax, rcx
. That's why x86 has sub
and sbb
instructions. If that's what you want, use it.
sub
including its FLAGS outputYou definitely need to avoid something like add rax,1
as the last step. That would set CF only if the final result is zero, wrapping from ULONG_MAX = -1.
Doing x -= y
as x += -y
works for OF in most cases. (But not the most-negative number y=LONG_MIN
(1UL<<63
), where neg rcx
would overflow).
But CF tells you about the 65-bit full result of 64 + 64-bit addition or subtraction. 64-bit negation isn't sufficient: x += -y
doesn't always set CF opposite of what x -= y
would.
Hardware ALUs normally use a binary Adder–subtractor which only does a single operation that propagates carry through all the bits, not a separate add before or after incrementing. That avoids any need to check and combine carry-out (and signed-overflow) results from two separate operations, and requires many fewer gates to implement and lower critical-path gate depth.
(Negating the most-negative number overflows; see a previous version of this answer for a failed attempt at emulating with neg
/add
/cmc
which might work for inputs other than LONG_MIN.)
The trick is to feed in a +1 to the carry input of the low bit for subtraction. (Or flip the existing carry input for sbb
). The pre-processing of B
is just conditionally flipping its bits (with XOR gates) to get ~B
as an input to A + ~B + 1
= A - B
done with a single add-with-carry.
This construction also makes the overflow and carry outputs from the ALU useful. Signed Overflow is directly usable. The carry output from the ALU is set if there was no borrow, clear if there was. So x86 needs to invert that ALU output to get CF, unlike for addition where it uses it directly. (ARM can always just use that ALU output directly since it has opposite semantics for CF from subs/cmp.)
For example with small positive inputs like 0x05 + (~0x06) + 1 = 0x01
: the binary addition wrapped past zero to a small unsigned value, i.e. it had a carry-out. But 5 - 6 = -1
does have a borrow. Conversely, 5 - 3 = +2
has no borrow, because 0x05 + (~0x03) + 1 = 0xFE = -2
doesn't wrap.
; Emulate SUB RAX, RCX. temp can be RDX for example
mov temp, rcx
not temp
stc ; CF = 1
adc rax, temp ; rax += ~rcx + 1
cmc ; CF = !CF like x86 sub does
; all flags except AF set like sub rax, rcx
To perfectly emulate sub
we'd also need to flip the AF bit (nibble-carry from bit #3 to bit #4). If you care, perhaps pushf
/ xor qword [rsp], 0x11
(flip CF and AF) / popf
, but that's slow. lahf
/xor ah, 0x11
/sahf
clears OF, which is unfortunately outside the low 8 bits of FLAGS so not restored by sahf
.
Related: