Search code examples
performanceoptimizationassemblyx86mov

Cost of swapping variables through mov, xor


Lets swap 2 variables.

int temp = a;
a = b;
b = temp;

Here's some half-optimized asm pseudocode:

mov eax, dword ptr [rbp+4]
mov ebx, dword ptr [rbp+8]
mov dword ptr [rbp+8], eax
mov dword ptr [rbp+4], ebx

Would it be faster to xor the objects with eachother?

a ^= b ^= a ^= b;

asm psuedocode:

mov eax, dword ptr[rbp+4]
xor eax, dword ptr[rbp+8]
xor dword ptr[rbp+8], eax
xor eax, dword ptr[rbp+8]
xor dword ptr[rbp+8], eax
mov eax, dword ptr[rbp+4]

Which of these would be faster? (Guestimates welcome)


Solution

  • Pulling it into two registers then writing back swapping the contents is likely the fastest of the solutions. Four memory cycles, four instructions, two registers. assuming the data has to start in and return to ram, then you cant beat this approach in general.

    Four xors assuming you could do memory for sources and destinations, is three cycles per xor, 12 memory cycles, that is a definite loser. using registers to avoid two mem operands just adds more instructions.

    Your asm pseudocode is 6 memory cycles. 6 instructions one register. The four cycles, four instructions two registers is likely cheaper. Now if you have to do two memory cycles to free up those registers it becomes 6 cycles. where this last one would be one additional to free up the register so 7. 6 is still cheaper than 7 and 5 instructions cheaper than 7, instruction size was not counted here but adds to memory cycles although fetching is likely done in an efficient manner (in good sized aligned chunks).

    If the data were already in registers, then using a third register and doing the three instruction tmp = a, a = b, b = tmp is three operations three registers and the fastest. But if you simply cant spare a register then four xors is faster.

    Thats all a generic high level view, there are likely processors and cache situations, etc that can make one solution that appears to be faster not end up being faster certainly for one test but perhaps in general depending on the situation.