Search code examples
assemblyx86-64micro-optimizationaddressing-mode

Address-size override prefix in 64-bit or using 64-bit registers


in Assembly Addressing (64-bit), which way is better?

mov   cl, BYTE [ebx + .DATA]

or

mov   cl, BYTE [rbx + .DATA]

?

the opcode for first way is : 67 8a 4b .. and the opcode for second way is : 8a 4b ..

so if we use 32-bit register, we need to have a '0x67' prefix (Address-size override prefix) so i think we added an extra job !!!

but i heard something about (CACHE) and it's better to use '32-bit' instead of '64-bit'

so which way is better at all ? and why ?


Solution

  • TL:DR: you basically never want address-size prefixes. Use 64-bit addressing modes.

    I heard something about (CACHE) and it's better to use '32-bit' instead of '64-bit'

    You're mixing up address-size with operand-size. 32-bit integers take half the space, so more of them fit in one cache line. Better spatial locality, less memory bandwidth.

    The defaults in 64-bit mode were chosen for a reason, and are what you should prefer when convenient, to save code-size when all else is equal (The advantages of using 32bit registers/instructions in x86-64):

    • address size = 64-bit
    • operand-size = 32-bit

    So something like mov ecx, [rdi] is the most efficient case; other sizes need REX or other prefixes. Byte operand-size uses different opcodes instead of prefixes but writing to 8-bit registers can have false dependencies on the old value of the full register. Prefer movzx loads; that's generally worth the extra byte of code-size for a 2-byte opcode.


    If your number is correctly zero-extended to 64 bits, avoid the address-size prefix and use

    movzx ecx,  byte [rbx + .DATA]
    

    Writing a 32-bit bit register implicitly zero-extends to 64-bit so you can use save cache footprint by using 32-bit data in memory.

    If an index might not be correctly zero- or sign-extended to address-size, you might need an extra instruction to make that happen (movsxd rcx, ebx or mov ecx, ebx) so you can use a 64-bit addressing mode.

    [reg + sign_extended_disp32] addressing modes are interesting an interesting case: they only work at all of the symbol address fits in 32 bits. If you know that the whole array is in the low 4GiB of virtual address space, you could maybe get away with [ebx + .DATA] to avoid an extra instruction to extend to 64 bits, if you knew there might be garbage in the high half of RBX. (So static addresses in user-space, but maybe not in a high-half kernel where you might have static data in the high 32-bits of 64-bit virtual address space.)


    If you know your pointers can be safely truncated to 32-bit (e.g. mmap(MAP_32BIT) or using the x32 ABI), you could even traverse a linked list or tree with an instruction like mov edi, [rdi] in a loop. Possibly useful for pointer-heavy data structures.

    (Your question was about array indices, not pointers; in asm you usually want to treat them as 32-bit unsigned integers, or 64 if arrays can be big. Or use pointers instead of [reg+disp32] to loop over an array; a disp32 absolute address only works in a Linux position-dependent executable, or Windows LARGEADDRESSAWARE=no.)