Search code examples
c++shared-libraries

Why is the memory of a global variable in shared library freed after I unload the library?


I've read some articles about shared library and known that when a shared library linked into our programs the code segment of it will be shared while the data segment of it will be copied and remapped individually into our programs. But I'm still not quit sure about one thing.

As the data segment is copied to every process, I think even if we unload the library the global variables in the library will be still valid. However, the result is not. If I still access the address of a global variable in the library after unloading it I get a invalid memory access fault. It is not much as what I expected, that it seems not really copied.


Solution

  • As was pointed out in the comments already, the C++ language does not specify some important aspects about how a program is launched. It also does not specify how exactly a program interacts with your operating system kernel, or how several independent processes can communicate etc.

    All these aspects are defined by

    • your actual runtime environment, which foremost means your actual operating system
    • the toolchain you are using, i.e. the combination of your compiler, the associated standard library and the support tooling installed in your OS

    That being said, there is some common reasoning, which is valid on many operating systems, notably those which are POSIX compliant.

    Part of this is the distinction between code segment and data segment. Your compiler has prepared executable assembly code for your specific target platform. When the library is loaded, the dynamic loader of the OS does some adjustments to fix the jump addresses in this assembly code to be correct with respect to the base address the code has been located to. Then, this code segment should work correct in any process, where it appears at the same memory location. This is the magic of virtual memory management; some pages can be made to appear at some specific location for a process (while the absolute memory address the OS kernel sees might be quite different).

    So this is the assembly code. After this adjustments, it can be shared easily. But this is not true for global variables, which the Library may create. Why? because if we just share global variables, and they could be mutable, we would have punched a hole into the isolation between different processes. One process could assign a value to the global variable, and another process could see this change. This would all be wild and uncoordinated and dangerous.

    For that reason, the only valid solution which keeps all distinct processes separated is to give each process (virtually) a distinct "instance" of the library. That means, the global variables must have a distinct and isolated "timeline" and evolution for each different process that uses the library. Simply speaking, this is the role of the data segments. And if we follow this reasoning, it is also logically that at the point where you unload a library, this timeline must end and the instance of the library must vanish. Any attempt to read from those memory locations after unloading the library are thus correctly sanctioned by the OS by a "Segmentation Fault"

    Since you tagged your question with C++, it should be added that there is typically a hidden internal mechanism somewhere in your runtime which invokes the constructors of all static (public or hidden) global data. And the counterpoint is to invoke the destructors of the same data elements. Overall, the order in which such static constructors / destructors are invoked is not specified and thus implementation-defined. Assuming any order can cause subtle bugs and security vulnerabilities.

    Yet, regarding shared libraries, it is worth noting that constructors are typically invoked the moment when the library is loaded and destructors are invoked when unloading the library, just before the data segment disappears from the memory space of the using process. Especially this means, that when you do load or unload a library dynamically at run time (i.e. you must call some OS function to do that), then at this point the constructor code or destructor code for static variables in your library will be called. Especially the constructor code might even do heap allocations. All of this is kind of logical, since the "instance" of the library is part of the using processes' memory space.

    Needless to say, we had several quite serious security vulnerabilities which managed to exploit those mechanisms.