Search code examples
c++gccaddress-sanitizer

C++ programs fail with ASAN (AddressSanitizer:DEADLYSIGNAL)


I have been using ASAN in C++ for a long time, and now I just link with ASAN and get DEADLYSIGNAL executing the resulting program. For example:

p.cpp:

int main() { return 0; }

I compile it:

$ c++ -o p p.cpp -fsanitize=address -fsanitize=undefined

I execute it:

$ ./p
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
Segmentation fault (core dumped)

This began to happen in some machines. In the machines where compiling with ASAN works, also works an executable compiled with ASAN in a machine where ASAN fails, so it does not seem to be a problem within the executable itself.

If I use a single sanitization, either sanitize=undefined or sanitize=address, it also works, but the combination of both fails. Of course, with no sanitization at all, works fine as well.

My setup is:

$ c++ --version
c++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Under Ubuntu 22.04.4.

gdb does not allow debugging on programs compiled with ASAN, but here's its output for what is worth:

$ gdb p
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from p...
(gdb) run
Starting program: /tmp/p 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
==432014==LeakSanitizer has encountered a fatal error.
==432014==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==432014==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)
[Inferior 1 (process 432014) exited with code 01]
(gdb) bt
No stack.
(gdb) 

Later notes:

  • this bug is perfectly reproducible by installing a fresh Ubuntu 22.04 and installing g++.
  • in some machines it takes a few tries before reproducing it.
  • at this point, I have not a single Ubuntu 22.04 machine where this bug does not reproduce.

Solution

  • EDIT: This is the same issue as: Possible Bug in GCC Sanitizers? See that thread for a more detailed explanation about the root cause.

    Original Response

    I just did some tests, and found that this is somehow related to ASLR (address space layout randomization). If you disable it (echo 0 | sudo tee /proc/sys/kernel/randomize_va_space) it will not crash. (At least it didn't crash for >20k tries.) This is also the reason why it doesn't crash every time, because it apparently depends on where in the address space the libraries get loaded.

    As a workaround you can disable ASLR on your development systems, but I would not recommend that in production, as ASLR is a useful security feature.

    This also gives a hint as to why it doesn't occur in an Ubuntu Docker container on another distro, because the kernel actually handles parts of ASLR, and a Docker container still runs on the kernel of the other distribution.

    My guess would be that Ubuntu applies some kernel patch that affects ASLR in a different way than other distributions, which is why this only occurs with Ubuntu VMs / direct installations, but not in e.g. Docker. It doesn't mean that the patch Ubuntu uses is necessarily faulty, it could still be that there's a bug in glibc and/or libasan that is only triggered by the conditions created by the Ubuntu kernel. (Or they indeed introduced a bug in the kernel.)

    I'll have a closer look at this issue on the weekend, but I hope this workaround helps you out in the mean time.

    (See also this askubuntu stackexchange question about ASLR on details how to disable it.)

    Update (Better Workaround)

    I've looked into this a bit further, and while I still haven't found the root cause, I've found another workaround that doesn't require you to disable ASLR on your system.

    What you can do is invoke the executable with the runtime linker directly, i.e. run

    /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 ./p 
    

    (assuming you have x86_64, if you have a different architecture, check readelf -l executable and see what program it specifies for INTERP)

    Background

    When you run a program, the kernel checks the ELF header of the program for a so-called interpreter, also called the runtime dynamic linker, or ld.so. If that is present, it also loads that program into memory (in addition to the executable), and gives control to that program (instead of directly to the executable). The runtime dynamic linker is then responsible for actually loading all of the shared libraries the program is linked against. Once all shared libraries have been loaded, ld.so then invokes the startup routine of the program itself. (A statically linked program doesn't require an interpreter, and is executed directly by the kernel.)

    But instead of letting the kernel load ld.so when your program is being started, you can also call ld.so directly and specify the program that is to be run as a command line argument to it. (That's a glibc feature.) This will cause ld.so to load the program itself (instead of relying on the kernel to have previously loaded it before loading ld.so), but after that it behaves like any other program.

    This also makes sense of the fact why it only occurs with Ubuntu's own kernels, but not other kernels in Docker containers, and why the kernel is a factor w.r.t. ASLR here.