Search code examples
cpointersmemory-address

How are memory addresses assigned in C?


I've learnt that the values of an array are stored "side by side" in terms of memory addresses such that the name of the array is a pointer to the first value of the array:

#include <stdio.h>

int main() {
    int array[] = {1, 2, 3};
    printf("%d", *array);       // The first value of array: 1
    printf("%d", *(array + 1)); // The second value of array: 2

}

Intuitively, I thought that variables declared in the code one after another were simply assigned neighbouring memory addresses. This idea goes against how an array is defined in memory, as all variables in my code would then make up one large array.

My question is, essentially, is there a way to know what the address of a variable is, relative to the other variables defined in my program, without, say, printing its address?


Solution

  • Intuitively, I thought that variables declared in the code one after another were simply assigned neighbouring memory addresses.

    The C compiler converts your code (designed for "the C abstract machine") into whatever happens to create the same behavior in a completely different language (e.g. machine code for the target CPU).

    As part of this "conversion to something radically different" local variables often cease to exist (replaced by registers that don't have a memory address because registers aren't memory) and even when they do exist they can be in any order, or overlap (e.g. same memory used for 2 different local variables that are used at different times).

    Arrays are "more special" because they're typically larger and harder for compilers to optimize (while complying with the rules of the language that define the abstract machine's behavior); so it's a lot more likely that elements of an array remain contiguous in memory; but this is not a guarantee of any kind.

    For an example, consider this code:

    int foo(int bar) {
        int myArray[] = { 1, 2, 3, 4};
    
        if(bar < 0) return bar + myArray[0];
        if(bar > 0) return bar + myArray[2];
        return bar + myArray[1];
    }
    

    If you compile this (like I did using godbolt at https://godbolt.org/ ) and examine the output, you might see something like this:

    foo(int):
      test edi, edi
      js .L6
      lea eax, [rdi+3]
      mov edx, 2
      cmove eax, edx
      ret
    .L6:
      lea eax, [rdi+1]
      ret
    

    As you can see; the array no longer exists at all (and none of the arrays elements have a memory address), because in this case the compiler was smart enough to optimize.

    The same thing happens to your code (the array no longer exists and has no addresses at all). It becomes this:

    .LC0:
      .string "%d"
    main:
      sub rsp, 8
      mov esi, 1                  // The value "1" originally came from the array
      mov edi, OFFSET FLAT:.LC0
      xor eax, eax
      call printf
      mov esi, 2                  // The value "2" originally came from the array
      mov edi, OFFSET FLAT:.LC0
      xor eax, eax
      call printf
      xor eax, eax
      add rsp, 8
      ret
    

    My question is, essentially, is there a way to know what the address of a variable is.

    Essentially; no. This is like feeding a horse carrots and then trying to determine where molecules of the original carrot will end up after the horse poops.

    The only thing you can do is get the address at run-time (e.g. using &variable), which (if and only if the compiler can't prove that the code to get the address can be discarded/ignored) has the side-effect of forcing the compiler to make sure that the variable actually does have an address.