Here's a simple program:
void __attribute__ ((constructor)) dumb_constructor(){}
void __attribute__ ((destructor)) dumb_destructor(){}
int main() {}
I compile it with the following flags:
g++ -O0 -fverbose-asm -no-pie -g -o main main.cpp
I check with gdb
that __libc_csu_init
is calling the function I tagged w/ constructor:
Breakpoint 1, dumb_constructor () at main.cpp:1
1 void __attribute__ ((constructor)) dumb_constructor(){}
(gdb) bt
#0 dumb_constructor () at main.cpp:1
#1 0x000000000040116d in __libc_csu_init ()
#2 0x00007ffff7abcfb0 in __libc_start_main () from /usr/lib/libc.so.6
#3 0x000000000040104e in _start ()
and I assume that destructor
attribute would mean dumb_destructor()
would be called during __libc_csu_fini
, but that's not happening:
Breakpoint 1, dumb_destructor () at main.cpp:3
3 void __attribute__ ((destructor)) dumb_destructor(){}
(gdb) bt
#0 dumb_destructor () at main.cpp:3
#1 0x00007ffff7fe242b in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#2 0x00007ffff7ad4537 in __run_exit_handlers () from /usr/lib/libc.so.6
#3 0x00007ffff7ad46ee in exit () from /usr/lib/libc.so.6
#4 0x00007ffff7abd02a in __libc_start_main () from /usr/lib/libc.so.6
#5 0x000000000040104e in _start ()
I sanity checked that __libc_csu_fini
really isn't doing anything with objdump and it really is a stub:
0000000000401190 <__libc_csu_fini>:
401190: f3 0f 1e fa endbr64
401194: c3 ret
Why do we call this _dl_fini
? What is _dl_fini
? Why is it being inconsistent and not calling __libc_csu_fini
?
I refer to the most recent glibc version tag as of writing this, which is glibc 2.34 (published in August 2021), and which changed quite a bit of the startup process (I highlight the major differences). Most findings should also apply to other versions and architectures. The ELF dumps in this answer are from an x86-64 system.
Before we can look into the destructors, we have to understand what is going on at startup.
I skip some kernel-mode parts here for brevity. We start at a point where our program's ELF file is already mapped into memory according to its segment ("program header") table:
$ readelf -l a.out
Elf file type is DYN (Shared object file)
Entry point 0x10a0
There are 13 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x00000000000002d8 0x00000000000002d8 R 0x8
INTERP 0x0000000000000318 0x0000000000000318 0x0000000000000318
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000628 0x0000000000000628 R 0x1000
LOAD 0x0000000000001000 0x0000000000001000 0x0000000000001000
0x0000000000000215 0x0000000000000215 R E 0x1000
LOAD 0x0000000000002000 0x0000000000002000 0x0000000000002000
0x00000000000001a0 0x00000000000001a0 R 0x1000
LOAD 0x0000000000002da8 0x0000000000003da8 0x0000000000003da8
0x0000000000000268 0x0000000000000270 RW 0x1000
...(and a few more)
Our application is dynamically linked (i.e., the ELF file does not contain all functions it calls), so we have to load all dependencies into the process's virtual address space as well. However, the kernel itself has only limited understanding of the ELF format, and should not make too many assumptions about the user space environment anyway. Thus, ELF specifies a special interpreter program, which path can be found in the INTERP
segment.
On Linux, this usually happens to be the dynamic linker lib64/ld-linux-x86-64.so.2
. The kernel subsequently loads that dynamic linker ELF into the same virtual address space as our application and then calls the dynamic linker's entry point (not the entry point of our application).
The dynamic linker now reads the DYNAMIC
segment (dynamic table) of our program, which contains information about needed dependencies, symbol tables, relocations and so on:
$ readelf -d a.out
Dynamic section at offset 0x2dc8 contains 27 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000c (INIT) 0x1000
0x000000000000000d (FINI) 0x1208
0x0000000000000019 (INIT_ARRAY) 0x3da8
0x000000000000001b (INIT_ARRAYSZ) 16 (bytes)
0x000000000000001a (FINI_ARRAY) 0x3db8
0x000000000000001c (FINI_ARRAYSZ) 16 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x3a0
0x0000000000000005 (STRTAB) 0x470
0x0000000000000006 (SYMTAB) 0x3c8
0x000000000000000a (STRSZ) 130 (bytes)
...(and a few more)
With that information it starts visiting all NEEDED
dependencies of our program, recursively. For each dependency, the following steps are executed:
dl_init
, which calls all functions from the INIT
/INIT_ARRAY
dynamic table entries (i.e., the library's constructors).Once the dynamic linker is done and all dependencies are loaded and initialized, it hands over control to our application's entry point (_start
).
_start
gets a few arguments, most notably a function pointer to _dl_fini
in rdx
. _start
then prepares the stack, places some arguments in registers and finally calls __libc_start_main
.
__libc_start_main
receives the following arguments:
main
(which is the main
method we wrote)argc
, argv
init
(pointing to __libc_csu_init
before glibc 2.34)fini
(pointing to __libc_csu_fini
before glibc 2.34)rtld_fini
(which equals the rdx
argument of _start
and thus points to _dl_fini
)The function does some initialization of libc, sets up thread local storage and stack canaries, and a lot more. Here we only care for two calls:
__cxa_atexit ((void (*) (void *)) rtld_fini, NULL, NULL);
, which registers _dl_fini
as destructor to run after program exitinit
= __libc_csu_init
(< glibc 2.34) or to call_init
(>= glibc 2.34)Both __libc_csu_init
and call_init
do basically the same thing: They run all constructors registered in the dynamic table entries INIT
and INIT_ARRAY
. However, while __libc_csu_init
is statically compiled into our program, call_init
lives in libc and thus in a different memory region. This was changed after security researchers found a ROP gadget in __libc_csu_init
's assembly code.
We thus observe the following backtrace for each constructor:
my_constructor()
__libc_csu_init()
(< glibc 2.34) or call_init()
(>= glibc 2.34)__libc_start_main()
_start()
After __libc_start_main
is done, it transfers control to our main
method:
_Noreturn static __always_inline void
__libc_start_call_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
int argc, char **argv MAIN_AUXVEC_DECL)
{
exit (main (argc, argv, __environ MAIN_AUXVEC_PARAM));
}
We now have seen what happens when an executable is initialized. But what about the end?
As we can see in the code snippet above, exit
runs as soon as main
returns. So what does exit
do?
Turns out, it only transfers control to __run_exit_handlers
:
void
exit (int status)
{
__run_exit_handlers (status, &__exit_funcs, true, true);
}
__run_exit_handlers
then calls the various functions which have been registered in the __exit_funcs
list via calls like __cxa_atexit
. If we now look back at the startup procedure, we see that this list should also contain our _dl_fini
function, as it was passed as rtld_fini
argument to _start
/__libc_start_main
!
_dl_fini
is the finalizer of the dynamic linker, which iterates through all dependencies and our executable and runs the destructors from FINI
and FINI_ARRAY
for each of them.
We thus get the following backtrace for each destructor:
my_destructor()
_dl_fini()
__run_exit_handlers()
exit()
__libc_start_main()
_start()
This answers the "what", but not the "why".
__libc_csu_fini
?(please take the following with a grain of salt - I could not find sources for the original reasoning, but inferred that from the source code, commit messages and some comments)
I believe that actually the contrary was intended: To be more consistent. The dynamic linker took care of running the constructors of all dependencies, so it should also run their destructors. And as our program is not much different to those dependencies, why not run its destructors as well? Probably that is the reason why __libc_csu_fini
was disabled around 17 years ago. I am not sure why it wasn't removed completely - probably to keep compatibilty with existing compilers.
With the recent release of glibc 2.34, both the __libc_csu_init
and __libc_csu_fini
functions were removed entirely, as their tasks are now done by other parts of the runtime.
dl_init
?Well, dl_init
runs before our app's entry point _start
- where several important parts of the runtime are not yet available (the initialization is done in __libc_start_main
). So our constructors would need to be self-contained and avoid calling external functions. As this would pose quite a risk for reliability and security, the constructors are instead executed after all other initialization is done.
Actually, there is support for initialization functions which are executed by dl_init
- these may be specified via the PREINIT
and PREINIT_ARRAY
dynamic table entries, and run before our _start
function. However, there does not appear to be a straightforward way to register these with the compiler, and it is not recommended for the above reasons anyway.
Note: Answering this question took a lot of digging into the inner workings of glibc, which turned out to be even more complex than I initially expected. In order to make this a coherent answer, I had to simplify a few things and skip others. If you find anything inaccurate, please feel free to edit or raise this in the comments.