Arithmetic identities and EFLAGS (emulate SUB using NOT and ADD?)

Since −x = not(x)+1 which then implies a-b = a+not(b)+1, would then

sub rax, rcx

be equivalent to

mov temp, rcx
not temp
add rax, temp
add rax, 1

where temp is some register considered to be volatile?

In other words, does the latter affect EFLAGS in the exact same way? If not, how can it be forced to?

Solution

Yes, that gets the same integer result in RAX.

In other words, does the latter affect EFLAGS in the exact same way?

No. ZF, SF, and PF only depend on the integer result, but CF and OF¹ depend on how you get there. x86's CF carry flag is a borrow output from subtraction. (Unlike some ISAs such as ARM, where subtraction sets the carry flag if there was no borrow.)

Trivial counterexample you could check in your head:
0 - 1 with sub sets CF=1. But your way clears CF.

mov temp, rcx        # no effect on FLAGS
not temp             # no effect on FLAGS, unlike most other x86 ALU instructions
add rax, ~1 = 0xFF..FE     # 0 + anything  clears CF
add rax, 1                 # 0xFE + 1 = 0xFF..FF = -1.  clears CF

(Fun fact: not doesn't affect FLAGS, unlike most other ALU instructions including neg. neg sets flags the same as sub from 0. A strange quirk of x86 history. https://www.felixcloutier.com/x86/not#flags-affected)

Footnote 1: AF, the half-carry flag (auxiliary) from the low to high nibble in the low byte, also depends on how you get there. You can't branch on it directly, and x86-64 removed the BCD instructions like aaa that read it, but it's still there in RFLAGS where you can read it with pushf / pop rax for example.

If not, how can it be forced to?

Use different instructions. The easiest and most efficient way to get the desired effect on EFLAGS would be to optimize it back to sub rax, rcx. That's why x86 has sub and sbb instructions. If that's what you want, use it.

Emulating `sub` including its FLAGS output

You definitely need to avoid something like add rax,1 as the last step. That would set CF only if the final result is zero, wrapping from ULONG_MAX = -1.

Doing x -= y as x += -y works for OF in most cases. (But not the most-negative number y=LONG_MIN (1UL<<63), where neg rcx would overflow).

But CF tells you about the 65-bit full result of 64 + 64-bit addition or subtraction. 64-bit negation isn't sufficient: x += -y doesn't always set CF opposite of what x -= y would.

Real ALUs do the +1 with carry-in, not as a separate step

Hardware ALUs normally use a binary Adder–subtractor which only does a single operation that propagates carry through all the bits, not a separate add before or after incrementing. That avoids any need to check and combine carry-out (and signed-overflow) results from two separate operations, and requires many fewer gates to implement and lower critical-path gate depth.

(Negating the most-negative number overflows; see a previous version of this answer for a failed attempt at emulating with neg/add/cmc which might work for inputs other than LONG_MIN.)

The trick is to feed in a +1 to the carry input of the low bit for subtraction. (Or flip the existing carry input for sbb). The pre-processing of B is just conditionally flipping its bits (with XOR gates) to get ~B as an input to A + ~B + 1 = A - B done with a single add-with-carry.

This construction also makes the overflow and carry outputs from the ALU useful. Signed Overflow is directly usable. The carry output from the ALU is set if there was no borrow, clear if there was. So x86 needs to invert that ALU output to get CF, unlike for addition where it uses it directly. (ARM can always just use that ALU output directly since it has opposite semantics for CF from subs/cmp.)

For example with small positive inputs like 0x05 + (~0x06) + 1 = 0x01 : the binary addition wrapped past zero to a small unsigned value, i.e. it had a carry-out. But 5 - 6 = -1 does have a borrow. Conversely, 5 - 3 = +2 has no borrow, because 0x05 + (~0x03) + 1 = 0xFE = -2 doesn't wrap.

; Emulate  SUB RAX, RCX.   temp can be RDX for example
   mov  temp, rcx
   not  temp
   stc             ; CF = 1
   adc  rax, temp  ; rax += ~rcx + 1
   cmc             ; CF = !CF  like x86 sub does
; all flags except AF set like  sub rax, rcx

To perfectly emulate sub we'd also need to flip the AF bit (nibble-carry from bit #3 to bit #4). If you care, perhaps pushf / xor qword [rsp], 0x11 (flip CF and AF) / popf, but that's slow. lahf/xor ah, 0x11/sahf clears OF, which is unfortunately outside the low 8 bits of FLAGS so not restored by sahf.

Arithmetic identities and EFLAGS (emulate SUB using NOT and ADD?)

Emulating sub including its FLAGS output

Real ALUs do the +1 with carry-in, not as a separate step

Emulating `sub` including its FLAGS output