x86 code to read in input into a local variable?

I am new to x86 assembly and have gone through this post discussing the use of scanf in NASM x86. It was quite enlightening, however most of the answers used a global variable by doing resd in the .bss section. Instead, I want to scanf into a local variable as if by the following C program:

  #include <stdio.h>
  int main(void) {
      int x;
      scanf("%d", &x);
      return x;
  }

x86 code:

; demo.asm

global main
extern scanf

section .data
fmt: db "%d", 0x00

section .text
main:
    ; prologue
    push ebp
    mov ebp, esp

    ; body

        ; allocate int x, [ebp - 4] = first local variable 
        sub esp, 4

        ; eax <- &x
        lea eax, [ebp - 4]

        ; pass parameters
        push eax ; push &x on stack (4 byte address)
        push fmt ; push format string on stack (4 byte address)

        ; call function
        call scanf

        ; clean (4 + 4 bytes) stack
        add esp, 8

        ; return number read from user to shell
        mov eax, [ebp - 4]

    ; epilogue
    mov esp, ebp ; redundant?
    pop ebp
    ret

I followed the CDECL calling convention and I am using the C 32-bit runtime as well as the C 32-bit standard library for linking.

$ nasm -f elf32 demo.asm
$ gcc -m32 -o demo demo.o
./demo
10
$ echo $?
10

This works, but I am worried about producing fatal errors in larger code. Is that a risk with this implementation?

Solution

Looks correct except for stack alignment. I'm guessing you're on Linux, given the nasm -f elf32.

Thus you're using the calling convention documented in the i386 System V ABI. It happens to be essentially the same as MS Windows CDECL, and some people call it that, but it differs from Windows in how 64-bit structs are returned for example (in memory on i386 SysV vs. in EDX:EAX in Windows cdecl.) If you say "cdecl" to describe your calling convention, with the context being Linux, most people will know that you mean i386 SysV as opposed to gcc -mregparm=3 or something, but IMO "i386 SysV" is nearly as short and much more specific / accurate as a description.

The i386 System V ABI requires 16-byte stack alignment before a call, thus ESP % 16 == 12 on function entry after call pushes a return address. (This applies in the ABI version used on modern Linux, due to GCC's accidental reliance -mpreferred-stack-boundary=4 which in 32-bit mode was supposed to be optional / best-effort to help performance. Documenting as a new requirement was the least-bad way out of the situation with binaries in the wild that would crash with misaligned ESP. Many other OSes using i386 System V, such as MacOS and FreeBSD, didn't adopt that change, and hand-written asm that only maintains 4-byte stack alignment is still correct there.)

Most of 32-bit library functions don't actually depend on that (e.g. they don't use movaps for 16-byte copies for locals in stack space the way many 64-bit functions do; see glibc scanf Segmentation faults when called from a function that doesn't align RSP). So in 32-bit code it's common that it will happen to work anyway, but could break in some future Linux distro, exactly the kind of invisible bug you were asking about. Assembly language is not one where trial and error can prove your code is safe and correct.

You have 16 bytes of stack adjustment before the call, rather than 12 + 16*n. (4 each from push ebp and sub esp,4, then 2 more pushes of args for scanf). You could drop the use of EBP as a frame pointer (and adjust the 2 instructions that referenced stack space to use ESP instead), or sub esp,16 instead of 4.

You might be tempted to ask scanf to overwrite one of its args with the conversion result. That would probably be safe in practice, especially the int * since it needs the pointer in a register to store through it, and it has no reason to read it again after. But functions own their incoming stack args, and can re-read them any number of times, and assume that no pointers alias them. scanf could in theory be reusing its incoming arg space for its own temporaries. Or more plausibly, as args for a tailcall; reusing arg space that way is something real compilers do.

mov esp, ebp is not redundant per-se; ESP is still 4 bytes below EBP at that point, as you can see from single-stepping your code with gdb and watching registers change with layout reg. To avoid needing it, you'd have to change other instructions. e.g. mov eax, [ebp - 4] could be pop eax.

It would not be safe to change add esp,8 to add esp,12 unless you move it after the mov eax, [ebp - 4]. Any space below ESP can be trashed asynchronously, e.g. by a signal handler if you had one. Only x86-64 System V has a red-zone below the stack pointer that's safe from that.

There is one instruction in your code that's fully redundant, add esp, 8. You're about to reset ESP to EBP, popping all stack allocations including the arg space. (If you'd been making multiple function calls, you could let some args accumulate or reusing their space with mov stores instead of pushes and pops. But push and pop are good for machine code density.)

The format string doesn't need to be in read-write .data; it can be in .rodata. (Or you could push-immediate and push esp, but it's normal to just put strings in .rodata even when they're tiny.)

If I was writing this for efficiency and minimalism, not doing stuff that isn't required or even useful for this specific function, but not aiming for simplicity and easiest to understand, I might have written this:

extern scanf

section .rodata
fmt: db "%d", 0       ; style: plain 0 seems appropriate to me as a terminator byte

section .text

global main           ; I prefer putting global next to the label, like in C how  static is with a function definition, not at the top of the file, but either is valid
main:
  ; ESP % 16 == 12 on entry
    push  -1       ; allocate int x, with a value in case the scanf conversion fails
    push  esp      ; &x.  push esp snapshots the input register before the -= 4 part of push
    push  fmt      ; push format string on stack (4 byte address)
    call scanf     ; 3x 4-byte pushes, ESP % 16 == 0 ready for a call
    ; TODO: check EAX==1 to make sure the conversion succeeded.

    add  esp, 8        ; clean the args from the stack
    pop  eax           ; load the %d conversion result.  (Or the value we pushed earlier if scanf didn't write it)

    ; epilogue
    ; nothing to do here, we didn't save any call-preserved registers
    ret

See What is an assembly-level representation of pushl/popl %esp? re: the details of what happens when you push esp: it pushes the old value of ESP, snapshotted before the esp-=4 effect of the push itself.

BTW, this is borderline code-review. Code review questions are off-topic on SO these days, but there is https://codereview.stackexchange.com/. This is so short and doing so little that I think it's ok as an SO question.