Search code examples
assemblygccx86-64

Can every MOV instruction be replaced with a PUSH/POP?


Just curious if all MOV's could be replaced with PUSH/POP's in a program?

I understand such replacements are not practical and inefficient.

This godbolt example shows a standard printf call using MOV and another printf with PUSH/POP for comparison. My intuition says it's possible but there are probably some gotchas along the way?

#include <stdio.h>

char format_string[] asm("format_string") = "%d %d %d %d %d\n";

void MOV_printf() {
    __asm__ (
        "subq $128, %%rsp\n\t"
        "lea format_string(%%rip), %%rdi\n\t"
        "movq $1, %%rsi\n\t"
        "movq $2, %%rdx\n\t"
        "movq $3, %%rcx\n\t"
        "movq $4, %%r8\n\t"
        "movq $5, %%r9\n\t"
        "call printf\n\t"
        "addq $136, %%rsp\n"
        ::: "rdi", "rsi", "rdx", "rcx", "r8", "r9", "rsp"
    );
}

void PUSH_POP_printf() {
    __asm__ (
        "subq $128, %%rsp\n\t"
        "lea format_string(%%rip), %%rdi\n\t"
        "pushq $1\n\t" 
        "popq %%rsi\n\t"
        "pushq $2\n\t" 
        "popq %%rdx\n\t"
        "pushq $3\n\t" 
        "popq %%rcx\n\t"
        "pushq $4\n\t" 
        "popq %%r8\n\t"
        "pushq $5\n\t" 
        "popq %%r9\n\t"
        "call printf\n\t"
        "addq $136, %%rsp\n"
        ::: "rdi", "rsi", "rdx", "rcx", "r8", "r9", "rsp"
    );        
}

int main() {
    MOV_printf();
    PUSH_POP_printf();
    return 0;
} 

Solutions

MOV r64, imm64 --- Replace with 4 pushw and a popq. example

MOV AH,DL --- Simulate it using push/pop and a scratch buffer. example

Gotchas

A mov is required for these

  • Control Registers (CR0, CR2, CR3, CR4, etc.,...)
  • Debug Registers (DR0, DR1, DR2, etc.,...)

Solution

  • Byte stores like mov %al, (%rdi) are not possible with push. Any emulation that loads/stores the containing word or qword and stores it back won't be thread-safe; non-atomic RMW of the containing word could step on a store to the other byte by another thread. (Can modern x86 hardware not store a single byte to memory? - it can, so can most ISAs, despite misconceptions.)

    If you're willing to accept non-thread-safe emulation, then perhaps partially-overlapping pop m16 operations could construct a word with the value you're looking for in a static buffer, which you can pop m16 / push m16 to copy over the original byte.

    But you won't know whether the byte at (%rdi) is the low or high byte of the 16-bit word that contains it, so you won't know which of -1(%rdi) or 0(%rdi) you can access without possibly segfaulting by going into the next page. Only an aligned 16-bit load / store is guaranteed not to cross any wider boundaries (like 4k page), and thus can't page-fault if the word contains any bytes you know are valid. (Is it safe to read past the end of a buffer within the same page on x86 and x64?)

    push/pop alone can't check the low bit of %rdi and branch accordingly.

    (x86-64 makes it impossible to have segment limits, which in 32-bit mode could have been an odd number of bytes, in the general case not assuming a flat memory model. But actually, x86-64 (still?) makes odd segment bases possible for FS and GS, I think, so mov %al, %fs:(%rdi) is even more unknown; even if you could test %1, %dil ; jnz, that still wouldn't tell you whether the linear address was odd or even.)


    Also, in addition to debug and control registers, x86-64 removed the opcodes for push/pop of segment registers other than FS/GS. https://www.felixcloutier.com/x86/push. So mov ds, eax is not emulatable either.