Search code examples
csegmentation-faultgdbcoredumpcore-file

"Segmentation fault (core dumped)" for: "No such file or directory" for libioP.h, printf-parse.h, vfprintf-internal.c, etc


Sample errors in the core dump files:

1289    vfprintf-internal.c: No such file or directory.
111 printf-parse.h: No such file or directory.
948 libioP.h: No such file or directory.
948 libioP.h: No such file or directory.

I'm working on a fast_malloc() implementation, but getting segmentation faults for unknown reasons once I override malloc() and free() with my own implementations, but NOT before that (meaning, if I call fast_malloc() it's fine, but if I want to be able to call malloc() to get my implementation, it seems to be broken).

Why the segfault?

Sample output, before ANYTHING can be printed, including the print statement at the start of main(), and some debug prints inside my fast_malloc():

Segmentation fault (core dumped)

I have turned on core dumps as I explain here.

So, gdb path/to/my/executable core shows some of the following core file info. Note that each run may result in a different statement for what file is missing in "No such file or directory."

  1. One run:
Reading symbols from build/fast_malloc_unit_tests...

warning: core file may not match specified executable file.
[New LWP 1257155]
Core was generated by `build/fast_malloc_unit_tests'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fd50fc7ba01 in __vfprintf_internal (s=0x7fd50fdee6a0 <_IO_2_1_stdout_>, 
    format=0x5622fd1b8010 "DEBUG: %s():\n", ap=ap@entry=0x7ffec28300a0, 
    mode_flags=mode_flags@entry=0) at vfprintf-internal.c:1289
1289    vfprintf-internal.c: No such file or directory.
(gdb) bt
#0  0x00007fd50fc7ba01 in __vfprintf_internal (s=0x7fd50fdee6a0 <_IO_2_1_stdout_>, 
    format=0x5622fd1b8010 "DEBUG: %s():\n", ap=ap@entry=0x7ffec28300a0, 
    mode_flags=mode_flags@entry=0) at vfprintf-internal.c:1289
#1  0x00007fd50fc66ebf in __printf (format=<optimized out>) at printf.c:33
#2  0x00005622fd1b53eb in fast_malloc (num_bytes=1024) at src/fast_malloc.c:225
#3  0x00005622fd1b5b66 in malloc (num_bytes=1024) at src/fast_malloc.c:496
#4  0x00007fd50fc86e84 in __GI__IO_file_doallocate (fp=0x7fd50fdee6a0 <_IO_2_1_stdout_>)
    at filedoalloc.c:101
#5  0x00007fd50fc97050 in __GI__IO_doallocbuf (fp=fp@entry=0x7fd50fdee6a0 <_IO_2_1_stdout_>)
    at libioP.h:948
#6  0x00007fd50fc960b0 in _IO_new_file_overflow (f=0x7fd50fdee6a0 <_IO_2_1_stdout_>, ch=-1)
    at fileops.c:745
#7  0x00007fd50fc94835 in _IO_new_file_xsputn (n=7, data=<optimized out>, f=<optimized out>)
    at libioP.h:948
#8  _IO_new_file_xsputn (f=0x7fd50fdee6a0 <_IO_2_1_stdout_>, data=<optimized out>, n=7)
    at fileops.c:1197
#9  0x00007fd50fc7baf2 in __vfprintf_internal (s=0x7fd50fdee6a0 <_IO_2_1_stdout_>, 
    format=0x5622fd1b8010 "DEBUG: %s():\n", ap=ap@entry=0x7ffec28308e0, 
    mode_flags=mode_flags@entry=0) at ../libio/libioP.h:948
#10 0x00007fd50fc66ebf in __printf (format=<optimized out>) at printf.c:33
#11 0x00005622fd1b53eb in fast_malloc (num_bytes=1024) at src/fast_malloc.c:225
#12 0x00005622fd1b5b66 in malloc (num_bytes=1024) at src/fast_malloc.c:496
--Type <RET> for more, q to quit, c to continue without paging--q
Quit
(gdb) q

  1. Another one:
Reading symbols from build/fast_malloc_unit_tests...

warning: core file may not match specified executable file.
[New LWP 1257787]
Core was generated by `build/fast_malloc_unit_tests'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f20b0bbba80 in __find_specmb (
    format=0x5644c516d108 "DEBUG:   block_map_i = %zu (num_bytes requested to allocate = %zu; smallest user block size large enough = %zu)\n") at printf-parse.h:111
111 printf-parse.h: No such file or directory.
(gdb) bt
#0  0x00007f20b0bbba80 in __find_specmb (
    format=0x5644c516d108 "DEBUG:   block_map_i = %zu (num_bytes requested to allocate = %zu; smallest user block size large enough = %zu)\n") at printf-parse.h:111
#1  __vfprintf_internal (s=0x7f20b0d2e6a0 <_IO_2_1_stdout_>, 
    format=0x5644c516d108 "DEBUG:   block_map_i = %zu (num_bytes requested to allocate = %zu; smallest user block size large enough = %zu)\n", ap=ap@entry=0x7ffe7f6ea580, mode_flags=mode_flags@entry=0)
    at vfprintf-internal.c:1365
#2  0x00007f20b0ba6ebf in __printf (format=<optimized out>) at printf.c:33
#3  0x00005644c516a47d in fast_malloc (num_bytes=1024) at src/fast_malloc.c:244
#4  0x00005644c516ab4e in malloc (num_bytes=1024) at src/fast_malloc.c:496
#5  0x00007f20b0bc6e84 in __GI__IO_file_doallocate (fp=0x7f20b0d2e6a0 <_IO_2_1_stdout_>)
    at filedoalloc.c:101
#6  0x00007f20b0bd7050 in __GI__IO_doallocbuf (fp=fp@entry=0x7f20b0d2e6a0 <_IO_2_1_stdout_>)
    at libioP.h:948
#7  0x00007f20b0bd60b0 in _IO_new_file_overflow (f=0x7f20b0d2e6a0 <_IO_2_1_stdout_>, ch=-1)
    at fileops.c:745
#8  0x00007f20b0bd4835 in _IO_new_file_xsputn (n=23, data=<optimized out>, f=<optimized out>)
    at libioP.h:948
#9  _IO_new_file_xsputn (f=0x7f20b0d2e6a0 <_IO_2_1_stdout_>, data=<optimized out>, n=23)
    at fileops.c:1197
#10 0x00007f20b0bbbaf2 in __vfprintf_internal (s=0x7f20b0d2e6a0 <_IO_2_1_stdout_>, 
    format=0x5644c516d108 "DEBUG:   block_map_i = %zu (num_bytes requested to allocate = %zu; smallest--Type <RET> for more, q to quit, c to continue without paging--q
Quit
(gdb) q

  1. another:
Reading symbols from build/fast_malloc_unit_tests...

warning: core file may not match specified executable file.
[New LWP 1258037]
Core was generated by `build/fast_malloc_unit_tests'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f901ef65e4d in __GI__IO_file_doallocate (fp=0x7f901f0cd6a0 <_IO_2_1_stdout_>)
    at libioP.h:948
948 libioP.h: No such file or directory.
(gdb) q
  1. another
Reading symbols from build/fast_malloc_unit_tests...

warning: core file may not match specified executable file.
[New LWP 1258336]
Core was generated by `build/fast_malloc_unit_tests'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f5e4b551a80 in __find_specmb (
    format=0x562fac6d7108 "DEBUG:   block_map_i = %zu (num_bytes requested to allocate = %zu; smallest user block size large enough = %zu)\n") at printf-parse.h:111
111 printf-parse.h: No such file or directory.
(gdb) q

My gcc build options at the moment:

-Wall -Wextra -Werror -O0 -ggdb -std=c11 -save-temps=obj -DDEBUG

Possibly related to this DEBUG_PRINTF() macro I have, which I call inside fast_malloc().

#ifdef DEBUG
    /// Debug printf function.
    /// See: https://stackoverflow.com/a/1941336/4561887
    #define DEBUG_PRINTF(...) printf("DEBUG: "__VA_ARGS__)
#else
    #define DEBUG_PRINTF(...) \
        do                    \
        {                     \
        } while (0)
#endif

Why is malloc() getting called before the program starts anyway? I don't call it anywhere. But, notice you can see malloc() getting called with 1024 bytes as visible in the stack traces in runs 1 and 2 (though it happens every run, those are the ones I have pasted enough you can see it in).

My malloc() and free() overrides look like this:

inline void* malloc(size_t num_bytes)
{
    return fast_malloc(num_bytes);
}

inline void free(void* ptr)
{
    fast_free(ptr);
}

Is my single-threaded program where malloc() is mysteriously getting called without me calling it somehow multi-threaded at startup? Does some weird program initialization stuff take place? My fast_malloc() implementation is currently NOT thread safe, so if Linux is doing some weird multi-threaded malloc() calls during some kind of program initialization or something, that could be the cause of the corruption, as again, fast_malloc(), which overrides malloc(), is NOT yet threadsafe.

It seems to be related to printing inside malloc(). Is printing inside malloc() forbidden?

Here is the bottom (first call is at very bottom) of a recent stack trace from a core dump:

#127471 0x00005626d43dca28 in malloc (num_bytes=1024) at src/fast_malloc.c:494
#127472 0x00007faa222a7e84 in __GI__IO_file_doallocate (fp=0x7faa2240f6a0 <_IO_2_1_stdout_>) at filedoalloc.c:101
#127473 0x00007faa222b8050 in __GI__IO_doallocbuf (fp=fp@entry=0x7faa2240f6a0 <_IO_2_1_stdout_>) at libioP.h:948
#127474 0x00007faa222b70b0 in _IO_new_file_overflow (f=0x7faa2240f6a0 <_IO_2_1_stdout_>, ch=-1) at fileops.c:745
#127475 0x00007faa222b5835 in _IO_new_file_xsputn (n=13, data=<optimized out>, f=<optimized out>) at libioP.h:948
#127476 _IO_new_file_xsputn (f=0x7faa2240f6a0 <_IO_2_1_stdout_>, data=<optimized out>, n=13) at fileops.c:1197
#127477 0x00007faa222aa678 in __GI__IO_puts (str=0x5626d43df227 '=' <repeats 13 times>) at libioP.h:948
#127478 0x00005626d43dca28 in malloc (num_bytes=1024) at src/fast_malloc.c:494
#127479 0x00007faa222a7e84 in __GI__IO_file_doallocate (fp=0x7faa2240f6a0 <_IO_2_1_stdout_>) at filedoalloc.c:101
#127480 0x00007faa222b8050 in __GI__IO_doallocbuf (fp=fp@entry=0x7faa2240f6a0 <_IO_2_1_stdout_>) at libioP.h:948
#127481 0x00007faa222b70b0 in _IO_new_file_overflow (f=0x7faa2240f6a0 <_IO_2_1_stdout_>, ch=-1) at fileops.c:745
#127482 0x00007faa222b5835 in _IO_new_file_xsputn (n=13, data=<optimized out>, f=<optimized out>) at libioP.h:948
#127483 _IO_new_file_xsputn (f=0x7faa2240f6a0 <_IO_2_1_stdout_>, data=<optimized out>, n=13) at fileops.c:1197
#127484 0x00007faa222aa678 in __GI__IO_puts (str=0x5626d43df227 '=' <repeats 13 times>) at libioP.h:948
#127485 0x00005626d43dca28 in malloc (num_bytes=1024) at src/fast_malloc.c:494
#127486 0x00007faa222a7e84 in __GI__IO_file_doallocate (fp=0x7faa2240f6a0 <_IO_2_1_stdout_>) at filedoalloc.c:101
#127487 0x00007faa222b8050 in __GI__IO_doallocbuf (fp=fp@entry=0x7faa2240f6a0 <_IO_2_1_stdout_>) at libioP.h:948
#127488 0x00007faa222b70b0 in _IO_new_file_overflow (f=0x7faa2240f6a0 <_IO_2_1_stdout_>, ch=-1) at fileops.c:745
#127489 0x00007faa222b5835 in _IO_new_file_xsputn (n=49, data=<optimized out>, f=<optimized out>) at libioP.h:948
#127490 _IO_new_file_xsputn (f=0x7faa2240f6a0 <_IO_2_1_stdout_>, data=<optimized out>, n=49) at fileops.c:1197
#127491 0x00007faa222aa678 in __GI__IO_puts (str=0x5626d43df238 "Running UNIT tests for the \"fast_malloc\" module.\n") at libioP.h:948
#127492 0x00005626d43dca98 in main () at src/fast_malloc_unit_tests.c:35
(gdb) 

What are __GI__IO_puts and _IO_new_file_xsputn and those other function calls as you move up? Are they calls in other threads? Are they calling malloc() behind-the-scenes? It appears __GI__IO_file_doallocate is...


Solution

  • To follow up and answer my own question: @Employed Russian's answer appears to be correct.

    To be more-specific: I have two main problems:

    1. Infinite recursion between malloc() and printf().
    2. Data corruption by freeing and reusing memory the system thinks it has exclusive access to.

    The 1st problem: infinite recursion

    I call printf() to do some debug prints inside my fast_malloc() implementation. So long as I do NOT override malloc() with my fast_malloc(), this is fine (so long as I protect the print with a mutex to make it multi-threaded-safe). BUT, once I do override malloc() with my fast_malloc(), this is NOT fine, because printf() calls malloc() to create a buffer into which it can place formatted string data. So, once malloc() becomes overridden by fast_malloc(), we end up with infinite recursion: prior to main() even being run, the system calls malloc() to prepare some things. This calls printf(), which calls malloc(), which calls printf()...forever until stack overflow...all before it has even entered my main() function.

    So, I see zero of my prints, and main() doesn't even get entered. You can see from my last stack trace I posted in my answer that I had 127492 stack frames on my stack at the time of the crash...at which point the stack overflowed. Sanity check: for a stack size of ~7.4 MB, that equates to about 7400000/127492 = ~58 bytes per stack frame, which seems reasonable.

    The 2nd problem: I'm freeing and reusing memory that the system (glibc) thinks it has safely acquired and still controls

    The code I'm running is my fast_malloc_unit_tests.c program, which, among other things, re-initializes the memory pools I'm using under-the-hood many times. Each time it does this, it considers prior-allocated memory to be freed, and it reallocates it when needed. BUT, printf() and other system calls run prior to main() even being entered have already called malloc() and think they still own this memory. So, we end up with me mistakenly reusing the memory they are using, causing data corruption and crashes.

    After disabling all prints inside my malloc() implementation, thereby removing the infinite recursion problem, I was able to see this behavior. In this case, the code did enter my main() function, I did see up to a few dozen of my prints before the crash, and there were only 2 calls (stack frames) on my stack at the time of the crash (rather than 127492 frames). They were:

    #0  0x000055555555589d in fast_malloc_print_stats () at src/fast_malloc.c:464
    #1  0x0000555555556228 in main () at src/fast_malloc_unit_tests.c:129
    

    Full output:

    Program received signal SIGSEGV, Segmentation fault.
    0x000055555555589d in fast_malloc_print_stats () at src/fast_malloc.c:464
    464             block = block->next_free_block;
    (gdb) bt
    #0  0x000055555555589d in fast_malloc_print_stats () at src/fast_malloc.c:464
    #1  0x0000555555556228 in main () at src/fast_malloc_unit_tests.c:129
    

    where fast_malloc.c line 464 contains:

    while (block != NULL)
    {
        free_block_cnt_walked++;
        block = block->next_free_block;   <==== line 464
    }
    

    which as far as I can tell has nothing wrong whatsoever, as it's a simple copy and block was already guaranteed NOT to be NULL, so calling block->next_free_block couldn't possibly be dereferencing a NULL ptr. I think the segmentation fault must therefore be due to corrupted memory because that memory is being double-used, so the block ptr probably is a corrupted address which is outside the valid bounds for us to read--hence the seg fault.


    That's it (I think). Now I've got to go do proper fixes and continue work on this. Big thanks goes out to @Employed Russian!

    See also:

    1. [my answer: a safe_printf() function which never calls malloc(), thereby solving the infinite recursion problem!] Which print calls in C do NOT ever call malloc() under the hood?