Search code examples
c++linuxsegmentation-faultgdbaddress-sanitizer

AddressSanitizer randomly throws SIGSEGV with no explanation


Project

I have a game project in C++ that I'm currently developing. I compile every source file with -g3 -std=c++2a -Wall ... -fsanitize=address -fsanitize=leak to check for leaks and Segfaults

The main problem

The problem is, randomly (1 in 5 times), asan (address or leak), terminates the program before reaching main with a SIGSEGV without any diagnostics.

AddressSanitizer:DEADLYSIGNAL
=================================================================
==28573==ERROR: AddressSanitizer: SEGV on unknown address 0x625505a4ce68 (pc 0x7cc52585f38f bp 0x000000000000 sp 0x7fff63949020 T0)
==28573==The signal is caused by a READ memory access.
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer: nested bug in the same thread, aborting.

The address the SEGV happens on is always different, as is the pc (except for the last 3 digits, e68, 38f respectively)

The system it runs on

My machine is Arch Linux 6.7.0-arch3-1 and I'm using g++ (GCC) 13.2.1 20230801, GNU gdb (GDB) 13.2, that are the latest on the repositories at the moment of writing

What I've tried

I have no idea how to hunt down this bug, nor what might be causing it.

In code

I am sure the problems happens before main since printing something (with cout or printf) has no effect, same for using a signal handler, signal(SIGSEGV, &handle);

asan is part of it

Without asan the SEGV does not happen. (I have tried 50~ times and the program started correctly every time)

gdb

Using gdb with the program compiled with asan and ASLR turned off caused the SIGSEGV and the automatic catch

assembly instruction of the problem

Given the strange pattern of addresses that the problem happens on I tried using a watchpoint on any $pc ending with 38f (watch ((size_t)$pc & 0xfff) == 0x38f). The watchpoint works, the address in question is contained in a libc function (do_lookup_x or similar) that is seemingly called thousands of times, before the main begins, making debugging this way practically a nightmare.

The question

I would like to ask if anybody has any idea on how to get more information out of asan, gdb, or any other tool, because at this moment I do not have enough information to know where the problem happens or even if the problem is mine or not.


Updates

@marekR and @eljay suggested some kind of symbol collision with some glibc function / names. Most of my definitions are enclosed in a namespace (thus also name mangled) and the only functions generic enough to collide with some other name are init(), loop(), and terminate(). Changing their name did not solve the issue

Following @ÖöTiib suggestion i tested my git history with git bisect, this problem present itself since the first commit, back in 2019, this means that it might have gone unnoticed all of this time, (I'm the only working on this project but seems unlikely), this is a combination of factors local to my machine, or something else


Solution

  • Thanks to @EmployedRussian I was capable of track down the bug origin. Since this was the point of this question I'd close this Post.

    I will try to solve the bug myself and, in case, open another question / bug tracker on asan if I'm not capable.

    I any case thank you for helping me.

    For anyone interested, compiling the binary with -fsanitize=address and running it under gdb with set disable-radomization off can cause the SIGSEGV, gdb should catch it automatically.

    I'd consider this question closed.