Search code examples
cwindowsassemblygccx86

Minimal 64-bit Windows executable crashes with tail-call optimization enabled by gcc


I'm trying to create a minimal 64-bit Windows executable to better understand how the Windows executable format works.

I wrote very basic assembly and C code as follows.

hi.s

    section .text

hi:
    db "hi", 0

    global sayHi
    align 16
sayHi:
    lea rax, [rel hi]
    ret

start.c

extern int puts();
extern const char *sayHi();

void start() {
    puts(sayHi());
}

compiled with,

nasm -fwin64 hi.s
gcc -c -ostart.obj -O3 -fno-optimize-sibling-calls start.c
# I will explain the flag

and linked with,

golink /fo r.exe /console start.obj hi.obj msvcrt.dll
# create a console application `r.exe`
# the default entry point is `start`

The program runs fine and prints hi, but note the gcc flag -fno-optimize-sibling-calls. That flag disables tail-call optimizations so that the program always allocates stack space and calls a function. Without the flag, the program crashes.

This is the disassembled result without tail-call optimization. Not sure why gcc put a nop there, but otherwise it's very simple and runs fine.

0000000000401000 <.text>:
  401000:   48 83 ec 28             sub    rsp,0x28
  401004:   e8 27 00 00 00          call   0x401030 # sayHi
  401009:   48 89 c1                mov    rcx,rax
  40100c:   e8 ff 2f 00 00          call   0x404010 # puts
  401011:   90                      nop
  401012:   48 83 c4 28             add    rsp,0x28
  401016:   c3                      ret    
  ...
  401020:   68 69 00 90 90          push   0xffffffff90900069 # "hi"
  ...
  401030:   48 8d 05 e9 ff ff ff    lea    rax,[rip+0xffffffffffffffe9] # 0x401020
  401037:   c3                      ret    

This is when tail-call opt is enabled, in which the program crashes.

0000000000401000 <.text>:
  401000:   48 83 ec 28             sub    rsp,0x28
  401004:   e8 27 00 00 00          call   0x401030 # sayHi
  401009:   48 89 c1                mov    rcx,rax
  40100c:   48 83 c4 28             add    rsp,0x28
  401010:   e9 eb 2f 00 00          jmp    0x404000 # puts
  ...
  401020:   68 69 00 90 90          push   0xffffffff90900069 # "hi"
  ...
  401030:   48 8d 05 e9 ff ff ff    lea    rax,[rip+0xffffffffffffffe9] # 0x401020
  401037:   c3                      ret    

Now the program doesn't allocate stack space before puts and simply does a jmp instead of call.

I investigated further to see where exactly it jumps when calling puts.

In the no-tail-call case, the called address 0x404010 in the .idata section has the instruction jmp QWORD PTR [rip+0xffffffffffffffea] # 0x404000, and 0x404000 seems to contain the address to puts.

However in the tail-call case, the called address 0x404000 has 54 40 00 00 which is no meaningful instruction. The debugger says the program segfaults at 0x404003, so I'm pretty sure the program chokes trying to execute a garbage instruction.

I must be doing something wrong, but I'm not sure which, so could you explain why the tail-call case fails and how to get it work?


Solution

  • The problem was on golink not correctly handling tail-calls. I searched a while to make GNU ld link the program with the same options given to golink.

    You can create a console-mode Windows executable by GNU ld with this command.

    ld -o... --subsystem=console object-files...
    

    --subsystem console or -subsystem=console also means the same. Use --subsystem=windows to create a GUI application.

    GNU ld also handles Windows dll files, so in this case, simply giving ld a copy of msvcrt.dll from the system folder worked.