Search code examples
cgccgdbqemumsys2

Root cause a segmentation fault


Background

I've built qemu-system-x86_64.exe on a Windows machine using MSYS2 (x86_64), and I'm debugging a segmentation fault that happens when I try to run it.
Actually I don't think the problem is related to either QEMU or MSYS2, it's a problem of debugging segmentation fault and possibly wrong code generation.

Debugging the Segmentation Fault

The program crashes with segmentation fault error right at the beginning.
When running with gdb, I found out the following:

Starting program: C:\msys64\home\Administrator\qemu\x86_64-softmmu\qemu-system-x86_64.exe
[New Thread 4656.0x1194]

Program received signal SIGSEGV, Segmentation fault.
0x00000000007d3254 in getpagesize () at util/oslib-win32.c:535
535     {

(gdb) bt
#0  0x00000000007d3254 in getpagesize () at util/oslib-win32.c:535
#1  0x000000000086dd39 in init_real_host_page_size () at util/pagesize.c:16
#2  0x00000000007ea1b2 in __do_global_ctors ()
    at C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/gccmain.c:67
#3  0x00000000007ea20f in __main ()
    at C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/gccmain.c:83
#4  0x000000000040137f in __tmainCRTStartup ()
    at C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:329
#5  0x00000000004014db in WinMainCRTStartup ()
    at C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:195

This is strange.
The program crashes when running __do_global_ctors and calling init_real_host_page_size() which calls getpagesize(). These are really simple functions:

uintptr_t qemu_real_host_page_size;
intptr_t qemu_real_host_page_mask;

static void __attribute__((constructor)) init_real_host_page_size(void)
{
    qemu_real_host_page_size = getpagesize();
    qemu_real_host_page_mask = -(intptr_t)qemu_real_host_page_size;
}

...

int getpagesize(void)
{
    SYSTEM_INFO system_info;

    GetSystemInfo(&system_info);
    return system_info.dwPageSize;
}

getpagesize() crashes right at the beginning of the function, before it even calls GetSystemInfo.
Here is the disassembly of that code fragment and register values:

(gdb) disassem
Dump of assembler code for function getpagesize:
   0x00000000007d3250 <+0>:     sub    $0x68,%rsp
=> 0x00000000007d3254 <+4>:     mov    %fs:0x0,%rax
   0x00000000007d325d <+13>:    mov    %rax,0x58(%rsp)
   0x00000000007d3262 <+18>:    xor    %eax,%eax
   0x00000000007d3264 <+20>:    lea    0x20(%rsp),%rcx
   0x00000000007d3269 <+25>:    callq  *0x68e8b9(%rip)        # 0xe61b28 <__imp_GetSystemInfo>
   0x00000000007d326f <+31>:    mov    0x24(%rsp),%eax
   0x00000000007d3273 <+35>:    mov    0x58(%rsp),%rdx
   0x00000000007d3278 <+40>:    xor    %fs:0x0,%rdx
   0x00000000007d3281 <+49>:    jne    0x7d3288 <getpagesize+56>
   0x00000000007d3283 <+51>:    add    $0x68,%rsp
   0x00000000007d3287 <+55>:    retq
   0x00000000007d3288 <+56>:    callq  0x85bde0 <__stack_chk_fail>
   0x00000000007d328d <+61>:    nop
End of assembler dump.
(gdb) info registers
rax            0x6f4b868           116701288
rbx            0x86ec10            8842256
rcx            0x6f4b8b8           116701368
rdx            0xe5a780            15050624
rsi            0x86e220            8839712
rdi            0x6f4ad50           116698448
rbp            0x6f4ad10           0x6f4ad10
rsp            0x22fd80            0x22fd80
r8             0x0                 0
r9             0x0                 0
r10            0x5000016b          1342177643
r11            0x22f9d8            2292184
r12            0x0                 0
r13            0x10                16
r14            0x0                 0
r15            0x0                 0
rip            0x7d3254            0x7d3254 <getpagesize+4>
eflags         0x10202             [ IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x2b                43
es             0x2b                43
fs             0x53                83
gs             0x2b                43

It looks like something is wrong with the memory access mov %fs:0x0,%rax.
Who sets FS to 83?

(gdb) starti
Starting program: C:\msys64\home\Administrator\qemu\x86_64-softmmu\qemu-system-x86_64.exe
[New Thread 3508.0x14b0]

Program stopped.
0x00000000778b6fb1 in ntdll!CsrSetPriorityClass ()
   from C:\Windows\SYSTEM32\ntdll.dll
(gdb) p $fs
$1 = 83
(gdb) watch $fs
Watchpoint 1: $fs
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00000000007d3254 in getpagesize () at util/oslib-win32.c:535
535     {

No one sets FS!

Questions

  • GCC generated code that uses uninitialized register. What could cause that? Was there some initialization code that should have run but didn't?
  • Any ideas how can I further debug this issue?

Solution

  • FS is an x86 segment register. These are generally not set by the user program, but instead set by the OS or by the runtime libraries, for various special purposes. For instance on Windows x86-64 GS is used to point to a per-thread data block: https://en.wikipedia.org/wiki/Win32_Thread_Information_Block (and FS is not used).

    In this case the problem is a bug in the GCC 8 compiler you are using: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86832

    In some situations this compiler generates code that assumes FS has been set up for "native TLS", which is wrong because MINGW does not support "native TLS" and FS is not set to anything useful.

    The workaround is to avoid compiling with the -fstack-protector-strong compiler option. For QEMU you can do that by passing configure the flag --disable-stack-protector.

    (PS: if you want to know how I identified the cause of this segfault: I googled for 'qemu-devel sigsegv getpagesize', which brings up a mailing list thread where somebody else ran into and reported the bug, the problem was diagnosed and a link to the GCC bug found.)