I was reading this amazing article: https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html about dynamic and static linking.
After finishing the reading 2 questions are still unanswered or no clear enough for me to understand.
1)
This is not fine for a shared library (.so). The whole point of a shared library is that applications pick-and-choose random permutations of libraries to achieve what they want. If your shared library is built to only work when loaded at one particular address everything may be fine — until another library comes along that was built also using that address. The problem is actually somewhat tractable — you can just enumerate every single shared library on the system and assign them all unique address ranges, ensuring that whatever combinations of library are loaded they never overlap. This is essentially what prelinking does (although that is a hint, rather than a fixed, required address base). Apart from being a maintenance nightmare, with 32-bit systems you rapidly start to run out of address-space if you try to give every possible library a unique location. Thus when you examine a shared library, they do not specify a particular base address to be loaded at
Then how does dynamic linking solve this issue? On the one hand the write mentions we can't use same address and on the other hand he says using multiple addresses will cause lack of free memory. I'm seeing a contradiction hear (Note: I know what's virtual address).
2)
This handles data, but what about function calls? The indirection used here is called a procedure linkage table or PLT. Code does not call an external function directly, but only via a PLT stub. Let's examine this:
I didn't get it, why the handling of data is different that functions? what's the problem of saving function's addresses inside GOT as we used to do with normal variables?
On the one hand the write mentions we can't use same address and on the other hand he says using multiple addresses will cause lack of free memory.
On Linux before the switch to ELF some 15-20 years ago, all shared libraries had to be globally coordinated. This was a maintenance nightmare, because a system can have many 100s of shared libraries. You run out of address space assigning unique address to each library, even though some of these libraries are never loaded together (but the assigner of address space range doesn't know a priori which libraries are never loaded together, and therefore could be loaded into the same range).
Dynamic loader solves this by placing libraries into an arbitrary address range as they are loaded, and relocating them so they correctly execute at the address they have just been loaded at.
The advantage here is that you don't need to partition your address space ahead of time.
why the handling of data is different that functions?
It's different because when you access data, the linker is not involved. The very first access must work, and the data must be relocated before the library is available. There's no function call you can hook for lazy dynamic linking.
But for a function call, the linker can be involved. The program calls a PLT "stub" function foo@plt
. On first call to that stub, it perform work to resolve a pointer to the actual foo()
definition, and saves that pointer. On subsequent calls, foo@plt
just uses the already-saved pointer to jump directly to the definition of foo()
.
This is called lazy relocation, and it saves a lot of work if the program never reaches many of the library function that it has call-sites for. (e.g. a program that evaluates a math expression and could call any libm.so.6
function, but for normal simple inputs, or with --help
, only calls a couple.)
You can observe the effect of lazy relocation by running a large program with lots of shared libraries with and without LD_BIND_NOW
environment variable (which disables lazy relocation).
Or with gcc -fno-plt
(https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00225.html), GCC will inline the call through the GOT, meaning the library function is reached in one call instead of two. (Some x86-64 Linux distros enable this for their binary packages.) This requires early binding, but slightly reduces the cost of each call, so is good for long-running programs. (PLT + early binding is the worst of both, except for having cache locality while resolving everything.)