Search code examples
cpointersmemoryprocessfork

Logical memory address for variables in C


Consider the following piece of code:

#include <stdio.h>
int main() {
  int a = 10;
  printf("%d %p\n", a, &a);
}

If I compile and execute the above code repeatedly, it will print different values for the address part of the printf statement.

If the logical memory space is 16 bit, the address from & operator should be between 0x0000 to 0xFFFF. We know that the address from & operator is not the same for different executions. My question is - what are the reasons leading to this uncertainty in memory address assignment? As logical address is mapped to physical address, shouldn't it be possible to have consistent logical address values even though physical addresses change?

Also, if I fork the process, the child process and parent process will print the exact same output for the printf statement. Why does the above behavior not occur when we fork a child process, even though it's spawning a new process?


Solution

  • Quoted from the answer linked by @tpr in the comments, the difference in addresses you observed is due to address space layout randomization:

    Local variables are allocated on the stack. Traditionally, stack allocation would be repeatable, but this has changed in recent years. Address space layout randomization (ASR) is a relatively recent innovation in OS memory management, which deliberately makes memory addresses in stack allocations (such as those you have observed) as non-deterministic as possible at runtime. It’s a security feature: this keeps bad actors from exploiting heap buffer overflows, because if the ASLR implementation is entropic enough, who knows what’s going to be there at the end of the overflowing buffer?

    Importantly, ASLR applies to allocation of stack itself (together with other data areas connected to the executable). As put succinctly on Wikipedia:

    In order to prevent an attacker from reliably jumping to, for example, a particular exploited function in memory, ASLR randomly arranges the address space positions of key data areas of a process, including the base of the executable and the positions of the stack, heap and libraries.

    The address being the same in a forked process is not due to copy-on-write as I initially answered. Even if you modify the variable in a forked process, the address will stay the same (although a copy of a variable will be made). Try running the following code:

    #include <stdio.h>
    #include <unistd.h>
    #include <sys/types.h>
    #include <sys/wait.h>
    
    int main() {
      int a = 10;
      int status;
      printf("%d %p\n", a, &a);
      pid_t pid = fork();
      if (pid == 0)
      {
        printf("FORKED: %d %p\n", a, &a);
        a = 11;
        printf("FORKED: %d %p\n", a, &a);
        return 0;
      } else {
      wait(&status);
      printf("%d %p\n", a, &a);
      return 0;
      }
    }
    

    and you will see that a only gets modified in the forked process, but the parent process will print it unchanged. However, the address remains the same in all the printed lines. This came as a bit of surprise while writing this answer, so upon searching I found this question. The answer is rather simple:

    Every single process gets its own 4G virtual address space and it's the job of the operating systems and hardware memory managers to map your virtual addresses to physical ones.

    So, while it may seem that two processes have the same address for a variable, that's only the virtual address.

    The memory manager will map that to a totally different physical address

    The following two quotes are from fork(2) man pages:

    The child process is created with a single thread—the one that called fork(). The entire virtual address space of the parent is replicated in the child, including the states of mutexes, condition variables, and other pthreads objects

    [...]

    Under Linux, fork() is implemented using copy-on-write pages, so the only penalty that it incurs is the time and memory required to duplicate the parent's page tables, and to create a unique task structure for the child.

    Due to copy-on-write mentioned in the second quote the underlying physical address might be the same for the equal virtual memory addresses in the forked process and its parent process. From Wikipedia:

    Copy-on-write (CoW or COW), sometimes referred to as implicit sharing or shadowing, is a resource-management technique used in computer programming to efficiently implement a "duplicate" or "copy" operation on modifiable resources. If a resource is duplicated but not modified, it is not necessary to create a new resource; the resource can be shared between the copy and the original. Modifications must still create a copy, hence the technique: the copy operation is deferred to the first write.

    Hence, until the variable is modified (or a member of exec* family is called), the same virtual addresses will most likely correspond to the same physical address (see man pages for exceptions).