Search code examples
c++linkerloadermexelf

How does a loader allocate/deallocate static data from a dynamic module


Question for the linker gurus out there. I have been working with mex files in Matlab and have been getting awful lot of unexplained crashes so I want to dig a bit deeper.

Can you explain to me how static data is allocated (deallocated) in a process' virtual memory space when a dynamic module is being loaded (unloaded)?

I assume this takes place in the _init() and _fini() functions. However does the BSS segment get assigned a chunk of memory in the heap space, along with other dynamic memory allocations?

What about global data in a dynamic module? Would there be possibility of symbol name clashes with the primary executable?

Thanks for shedding light on these issues. If I have to choose a platform I would like to hear from the ELF experts since I do most of my development on Linux.


Solution

  • Can you explain to me how static data is allocated (deallocated) in a process' virtual memory space when a dynamic module is being loaded (unloaded)?

    That part is easy: every ELF file has PT_LOAD segments, which you can see in the output from readelf -Wl foo.so. When loading the shared object, each of these segments is mmaped into address space, and that serves as "allocation" for any static data in that shared object.

    When foo.so is unloaded, the data (and code) is disposed of via munmap system call.

    I assume this takes place in the _init() and _fini() functions

    That assumption is not correct. The _init and _fini are about dynamic initialization (e.g. global variables of class type in C++ with a non-trivial constructor/destructor). By the time _init is called, the memory for all globals has already been "reserved" via mmap.

    However does the BSS segment

    The .bss section is included in the same PT_LOAD segment in which other initialized (writable) data is. This is why there is a separate p_filesz and p_memsz in the ElfXX_Phdr: the p_filesz "covers" initialized data, and (larger) p_memsz causes the mmap to "allocate" space for both initialized and .bss data.

    What about global data in a dynamic module?

    What about it? I covered initialized data above.

    Would there be possibility of symbol name clashes with the primary executable?

    Certainly. You can define int foo = 42; in a.out, and int foo = 24; in foo.so. The usual rule is that if foo is visible in the dynamic symbol table of a.out, then that foo will be used regardless of where it is referenced from.

    Complications arise when a.out does not export foo (if it is not linked with -rdymamic and does not link against foo.so), or when foo.so is linked with -Bsymbolic.