Search code examples
memoryprocessoperating-systemram

What would be an example of position-dependent code?


Position-dependent code is written to be loaded in, and run from, a particular physical address in memory. One of the problems that this type of code poses is that it hinders the processor's capability to run multiple processes concurrently, mainly when different processes that were written to run from the same address try to get executed simultaneously.

Having said that, I never encountered code that specifies the memory address on which it's meant to be executed, and so I find it difficult to picture what would such a code look like. I can see that a given code could specify the address in which a particular variable is meant to be stored in memory, but when it comes to the [first] memory address in which the program is going to be loaded, I don't see why this isn't the OS job rather than being the responsibility of the program.


Solution

  • One of the problems that this type of code poses is that it hinders the processor's capability to run multiple processes concurrently, mainly when different processes that were written to run from the same address try to get executed simultaneously.

    On most desktop consumer computers, there is paging. With paging, CPU instructions contain virtual addresses instead of physical ones. Before execution, the addresses on which the instructions operate are passed to the MMU (Memory Management Unit) for translation through the page tables. These page tables can translate a virtual address anywhere in physical RAM.

    On today's computers, each thread runs on one specific core. Each core has its own Page Table Base Pointer (PTBP) register which contains the physical address of the beginning of the first level of page tables. When the OS wants to switch the thread, it saves the currently executing thread's information and switches them to the other thread's information (including the PTBP register).

    Since virtual addresses (VA) can translate anywhere in RAM, each thread can have access to all the virtual address space (VAS) available. Since the VAS spans more than 100k GB, The amount of RAM for each thread is limited only by the physical amount of memory available. Each thread can start at the same address and still execute concurrently as long as the page tables translate the addresses they mention to different physical addresses.

    I never encountered code that specifies the memory address on which it's meant to be executed, and so I find it difficult to picture what would such a code look like.

    Most code you encounter actually specifies a starting address. The starting address is mostly a suggestion and isn't necessarily respected by the OS loader. It works that way because before there was ASLR, the code was actually not position independent and simply mentioned all addresses as absolute. Like you mention, most code is position independent today. Even with position independent code, you don't see the addresses that the compiler outputs from your code but there are a lot of them. The compiler is responsible for calculating the addresses and the offsets in your code to reach certain functions or some data.

    There is mostly 3 types of memory currently on modern computers:

    1. Automatic memory (the stack)

    For automatic storage, the compiler calculates only offsets from the Stack Pointer register. It can also be offsets from the Base Pointer register. For example, on x64 without optimization you'll have something like:

    int main( void ) {
        int a;
        a = 3;
        return 0;
    }
    

    Which compiles to:

    main:
            push    rbp
            mov     rbp, rsp
            mov     DWORD PTR [rbp-4], 3
            mov     eax, 0
            pop     rbp
            ret
    

    Here you see the third instruction mov DWORD PTR [rbp-4], 3 is accessing the stack with an offset of 4 bytes from RBP and placing the value 3 at that address as requested.

    1. Static/global data

    Static or global data is data which is either declared static or declared outside a function. This data has reserved space in the executable file after compilation. The OS places that data in RAM during program loading. The data is accessed using PC-relative addressing (with an offset from the program counter) to make accesses position independent. Again on x64 you'll have something like:

    int glob;
    
    int main( void ) {
        glob = 5;
        return 0;
    }
    

    Compiling to:

    glob:
            .zero   4
    main:
            push    rbp
            mov     rbp, rsp
            mov     DWORD PTR glob[rip], 5
            mov     eax, 0
            pop     rbp
            ret
    

    Again, the third instruction puts 5 in the glob integer by dereferencing an offset from RIP (PC on x64).

    1. Heap data

    The heap has almost half the VAS reserved for its data. It can span from the end of the executable code and static data up to half the VAS (more 100k GB). The higher half is reserved for the kernel (which doesn't need it all obviously). The heap is actually dynamically allocated. You need to make a syscall in C/C++ to get memory there. To do that, you'll call malloc() or use the new keyword. This memory is simply accessed with absolute addresses since it is dynamic. Depending on the operation, it will be probably using a temporary register for holding values or addresses.