Here is my question. Suppose you want to compile the c code:
void some_function() {
write_string("Hello, World!\n");
}
For this example, I want to focus specifically on the string: "Hello, World!\n". My understanding is that the compiler will put the string into the .rodata section in an elf file. A symbol, referring to its location in the .rodata section, is added to the symbol table and that symbol is kept in the .text section as a placeholder for the location of the string.
Here is the problem. How can you leave a value like that unresolved in machine code? In x86, it should be easy enough for the linker to do a find and replace on the symbol when the location is known. However, there are many CPU architectures where an address can not be encoded in its entirety into a single machine instruction. Therefore the value would have to be loaded in 2 stages, using separate machine instructions and the linker would have to figure that out. It would have to be smart enough to manipulate the machine code with half the address in one place the half the address in another. Furthermore, somehow the elf file has to represent this complex encoding scheme for the linker later on. How does this all work?
I most programs, this will be in a user space application. So the kernel may load the .rodata section wherever it wants in memory. So it would seem that when the program is loaded, somehow, at runtime, the kernel loader would have to resolve all these symbols in the program prior to beginning execution. It would have to inject into the machine code where it put each section so they may be referenced appropriately. How does this work?
I have a feeling that my understanding and above descriptions are wrong or that I am missing something very important because this does not seem right to me. Ether that, or there is in fact the logic to preform these complex functions within modern kernels and linkers. I am looking for some further explanation and understanding.
Compilation takes place, emitting something like this:
lea rdi, [rip+some_function.hello_world]
mov rax, [rip+some_function.write_string]
call rax
after the asm pass, we end up with something that disassembles to
lea rdi, [rip+00000000]
mov rax, [rip+00000000]
call rax
where the two 00000000
slots are filled as load-time fixups. The loader performs symbol resolution and fills in the 00000000
values with the correct values.
This is a simplification. In reality there's an extra layer of indirection called the global offset table, which is used (among other things) to put all the fixups right next to each other.
The innards of how this works is CPU and OS specific, but in general you don't really have to care exactly how it works, and it could change in the next release of the compiler (and has changed at least twice already). The loader understands fixups at a very generic level using a fixup table, and can deal with new ideas so long as they resolve to put (absolute or relative) address of a symbol at offset + size.
The Alpha processor had it kind of bad back in the day. Fixups had to be in between functions, and relative addressing could be only done in signed 16 bit sizes, so the fixups for functions were located immediately before or after each function, and presumably you got an error in the ASM pass if the pointer didn't fit because the function was too big. I did come up with a clever sequence that would have fixed the problem on Alpha, but that was long after the platform was retired, and nobody cares anymore so it never got implemented.
I remember the bad old days from before the loader could do good patchups. There once was a global (and I really do mean global) table of shared library load addresses, and the compiler emitted absolute addresses and you had to rebuild your application if you changed a library, even though you used shared libraries. That just wasn't the brightest ideas, and no wonder people keps statically linked emergency binaries lying around. Breaking libc wasn't fun.