Search code examples
assemblyx86scanfx86-64gnu-assembler

Using scanf with x86-64 GAS assembly


I have been having loads of issues trying to get a call the the system function scanf to work in my x86 assembly program. Currently I have got it to read from standard in however, it only will read chars without a segfault (I have no idea why, the specifying string is %d). The examples I've seen of scanf in x86 online use quarky or are written with NASM syntax, thus I have tried to adapt them for my program.

f:
    .string "%d"

_main:
    movq    $0,    %rax    #Clean rax
    movq    $f,    %rdi    #Load string format
    movq    %rcx,  %rsi    #Set storage to rcx (Not sure if this is valid)
    call    scanf
    ret

Checking rcx and rax using printf return 1 and 0 respectively after inputting a char or string (only way the program doesn't segfault).

Any insight on how to us scanf properly in x86 gas assembly would be very much appreciated!


Solution

  • As you feared, movq %rcx, %rsi is not correct. You need to pass a pointer to memory. Registers are not part of the memory address space and thus you can't have pointers to them. You need to allocate storage either globally or locally. Incidentally, you should not put your data (especially writable) into the default .text section, as that is intended for code and is typically read-only. Also, calling convention usually mandates 16 byte stack pointer alignment, so you should take care of that too.

    .globl main
    
    main:
        push %rbp           # keep stack aligned
        mov  $0, %eax       # clear AL (zero FP args in XMM registers)
        leaq f(%rip), %rdi  # load format string
        leaq x(%rip), %rsi  # set storage to address of x
        call scanf
        pop %rbp
        ret
    
    .data
    
    f:  .string "%d"         # could be in .rodata instead
    x:  .long 0
    

    (If your environment expects a leading underscore on symbols, then use _main, and probably _scanf.)


    There are actually 3 choices for putting addresses of symbols / labels into registers. RIP-relative LEA is the standard way on x86-64. How to load address of function or label into register in GNU Assembler

    As an optimization if your variables are in the lower 4GiB of the address space, e.g. in a Linux non-PIE (position-dependent) executable, you can use 32-bit absolute immediates:

        mov  $f, %edi       # load format string
        mov  $x, %esi       # set storage to address of x
    

    movq $f, %rdi would use a 32-bit sign-extended immediate (instead of implicit zero-extension into RDI from writing EDI), but has the same code-size as a RIP-relative LEA.

    You can also load the full 64 bit absolute address using the mnemonic movabsq. But don't do that because a 10-byte instruction is bad for code-size, and still needs a runtime fixup because it's not position-independent.

        movabsq $f, %rdi # load format string
        movabsq $x, %rsi # set storage to address of x
    

    Upon request: using a local variable for the output could look like:

        subq  $8, %rsp       # allocate 8 bytes from stack
        xor   %eax, %eax     # clear AL (and RAX)
        leaq  f(%rip), %rdi  # load format string
        movq  %rsp, %rsi     # set storage to local variable
        call  scanf
        addq  $8, %rsp       # restore stack
        ret